diff --git a/.codecov.yml b/.codecov.yml index e2309df738..476ff82fb1 100644 --- a/.codecov.yml +++ b/.codecov.yml @@ -1,4 +1,3 @@ ignore: - - "src/examples/.*" - - "src/tutorials/.*" - "src/helpers/exports/.*" # exclude exports, since there are no CI tests for Asy-exports + - "src/documentation_glossary.jl" # exclude this since it is just use to create the docs and code cov goes bogus on this. \ No newline at end of file diff --git a/.vale.ini b/.vale.ini index 18596bbabf..ff2bb0bce6 100644 --- a/.vale.ini +++ b/.vale.ini @@ -7,7 +7,6 @@ Packages = Google [formats] # code blocks with Julia in Markdown do not yet work well qmd = md -jl = md [docs/src/*.md] BasedOnStyles = Vale, Google diff --git a/Changelog.md b/Changelog.md index e6c1eb8e1a..71eb8297ac 100644 --- a/Changelog.md +++ b/Changelog.md @@ -5,16 +5,105 @@ All notable Changes to the Julia package `Manopt.jl` will be documented in this The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## [0.4.70] WIP - -### Added - +# [0.5.0] – August 29, 2024 + +This breaking update is mainly concerned with improving a unified experience through all solvers +and some usability improvements, such that for example the different gradient update rules are easier to specify. + +In general we introduce a few factories, that avoid having to pass the manifold to keyword arguments + +## Added + +* A `ManifoldDefaultsFactory` that postpones the creation/allocation of manifold-specific fields in for example direction updates, step sizes and stopping criteria. As a rule of thumb, internal structures, like a solver state should store the final type. Any high-level interface, like the functions to start solvers, should accept such a factory in the appropriate places and call the internal `_produce_type(factory, M)`, for example before passing something to the state. +* a `documentation_glossary.jl` file containing a glossary of often used variables in fields, arguments, and keywords, to print them in a unified manner. The same for usual sections, tex, and math notation that is often used within the doc-strings. + +## Changed + +* Any `Stepsize` now hase a `Stepsize` struct used internally as the original `struct`s before. The newly exported terms aim to fit `stepsize=...` in naming and create a `ManifoldDefaultsFactory` instead, so that any stepsize can be created without explicitly specifying the manifold. + * `ConstantStepsize` is no longer exported, use `ConstantLength` instead. The length parameter is now a positional argument following the (optonal) manifold. Besides that `ConstantLength` works as before,just that omitting the manifold fills the one specified in the solver now. + * `DecreasingStepsize` is no longer exported, use `DecreasingLength` instead. `ConstantLength` works as before,just that omitting the manifold fills the one specified in the solver now. + * `ArmijoLinesearch` is now called `ArmijoLinesearchStepsize`. `ArmijoLinesearch` works as before,just that omitting the manifold fills the one specified in the solver now. + * `WolfePowellLinesearch` is now called `WolfePowellLinesearchStepsize`, its constant `c_1` is now unified with Armijo and called `sufficient_decrease`, `c_2` was renamed to `sufficient_curvature`. Besides that, `WolfePowellLinesearch` works as before, just that omitting the manifold fills the one specified in the solver now. + * `WolfePowellBinaryLinesearch` is now called `WolfePowellBinaryLinesearchStepsize`, its constant `c_1` is now unified with Armijo and called `sufficient_decrease`, `c_2` was renamed to `sufficient_curvature`. Besides that, `WolfePowellBinaryLinesearch` works as before, just that omitting the manifold fills the one specified in the solver now. + * `NonmonotoneLinesearch` is now called `NonmonotoneLinesearchStepsize`. `NonmonotoneLinesearch` works as before, just that omitting the manifold fills the one specified in the solver now. + * `AdaptiveWNGradient` is now called `AdaptiveWNGradientStepsize`. Its second positional argument, the gradient function was only evaluated once for the `gradient_bound` default, so it has been replaced by the keyword `X=` accepting a tangent vector. The last positional argument `p` has also been moved to a keyword argument. Besides that, `AdaptiveWNGradient` works as before, just that omitting the manifold fills the one specified in the solver now. +* Any `DirectionUpdateRule` now has the `Rule` in its name, since the original name is used to create the `ManifoldDefaultsFactory` instead. The original constructor now no longer requires the manifold as a parameter, that is later done in the factory. The `Rule` is, however, also no longer exported. + * `AverageGradient` is now called `AverageGradientRule`. `AverageGradient` works as before, but the manifold as its first parameter is no longer necessary and `p` is now a keyword argument. + * The `IdentityUpdateRule` now accepts a manifold optionally for consistency, and you can use `Gradient()` for short as well as its factory. Hence `direction=Gradient()` is now available. + * `MomentumGradient` is now called `MomentumGradientRule`. `MomentumGradient` works as before, but the manifold as its first parameter is no longer necessary and `p` is now a keyword argument. + * `Nesterov` is now called `NesterovRule`. `Nesterov` works as before, but the manifold as its first parameter is no longer necessary and `p` is now a keyword argument. + * `ConjugateDescentCoefficient` is now called `ConjugateDescentCoefficientRule`. `ConjugateDescentCoefficient` works as before, but can now use the factory in between + * the `ConjugateGradientBealeRestart` is now called `ConjugateGradientBealeRestartRule`. For the `ConjugateGradientBealeRestart` the manifold is now a first parameter, that is not necessary and no longer the `manifold=` keyword. + * `DaiYuanCoefficient` is now called `DaiYuanCoefficientRule`. For the `DaiYuanCoefficient` the manifold as its first parameter is no longer necessary and the vector transport has been unified/moved to the `vector_transport_method=` keyword. + * `FletcherReevesCoefficient` is now called `FletcherReevesCoefficientRule`. `FletcherReevesCoefficient` works as before, but can now use the factory in between + * `HagerZhangCoefficient` is now called `HagerZhangCoefficientRule`. For the `HagerZhangCoefficient` the manifold as its first parameter is no longer necessary and the vector transport has been unified/moved to the `vector_transport_method=` keyword. + * `HestenesStiefelCoefficient` is now called `HestenesStiefelCoefficientRule`. For the `HestenesStiefelCoefficient` the manifold as its first parameter is no longer necessary and the vector transport has been unified/moved to the `vector_transport_method=` keyword. + * `LiuStoreyCoefficient` is now called `LiuStoreyCoefficientRule`. For the `LiuStoreyCoefficient` the manifold as its first parameter is no longer necessary and the vector transport has been unified/moved to the `vector_transport_method=` keyword. + * `PolakRibiereCoefficient` is now called `PolakRibiereCoefficientRule`. For the `PolakRibiereCoefficient` the manifold as its first parameter is no longer necessary and the vector transport has been unified/moved to the `vector_transport_method=` keyword. + * the `SteepestDirectionUpdateRule` is now called `SteepestDescentCoefficientRule`. The `SteepestDescentCoefficient` is equivalent, but creates the new factory interims wise. + * `AbstractGradientGroupProcessor` is now called `AbstractGradientGroupDirectionRule` + * the `StochasticGradient` is now called `StochasticGradientRule`. The `StochasticGradient` is equivalent, but creates the new factory interims wise, so that the manifold is not longer necessary. + * the `AlternatingGradient` is now called `AlternatingGradientRule`. + The `AlternatingGradient` is equivalent, but creates the new factory interims wise, so that the manifold is not longer necessary. +* `quasi_Newton` had a keyword `scale_initial_operator=` that was inconsistently declared (sometimes bool, sometimes real) and was unused. + It is now called `initial_scale=1.0` and scales the initial (diagonal, unit) matrix within the approximation of the Hessian additionally to the $\frac{1}{\lVert g_k\rVert}$ scaling with the norm of the oldest gradient for the limited memory variant. For the full matrix variant the initial identity matrix is now scaled with this parameter. * Unify doc strings and presentation of keyword arguments * general indexing, for example in a vector, uses `i` * index for inequality constraints is unified to `i` running from `1,...,m` * index for equality constraints is unified to `j` running from `1,...,n` * iterations are using now `k` -* Doc strings unified and even reusing similar docstring snippets. +* `get_manopt_parameter` has been renamed to `get_parameter` since it is internal, + so internally that is clear; accessing it from outside hence reads anyways `Manopt.get_parameter` +* `set_manopt_parameter!` has been renamed to `set_parameter!` since it is internal, + so internally that is clear; accessing it from outside hence reads `Manopt.set_parameter!` +* changed the `stabilize::Bool=` keyword in `quasi_Newton` to the more flexible `project!=` + keyword, this is also more in line with the other solvers. Internally the same is done + within the `QuasiNewtonLimitedMemoryDirectionUpdate`. To adapt, + * the previous `stabilize=true` is now set with `(project!)=embed_project!` in general, + and if the manifold is represented by points in the embedding, like the sphere, `(project!)=project!` suffices + * the new default is `(project!)=copyto!`, so by default no projection/stabilization is performed. +* the positional argument `p` (usually the last or the third to last if subsolvers existed) has been moved to a keyword argument `p=` in all State constructors +* in `NelderMeadState` the `population` moved from positional to keyword argument as well, +* the way to initialise sub solvers in the solver states has been unified In the new variant + * the `sub_problem` is always a positional argument; namely the last one + * if the `sub_state` is given as a optional positional argument after the problem, it has to be a manopt solver state + * you can provide the new `ClosedFormSolverState(e::AbstractEvaluationType)` for the state + to indicate that the `sub_problem` is a closed form solution (function call) and how it + has to be called + * if you do not provide the `sub_state` as positional, the keyword `evaluation=` is used + to generate the state `ClosedFormSolverState`. + * when previously `p` and eventually `X` where positional arguments, they are now moved + to keyword arguments of the same name for start point and tangent vector. + * in detail + * `AdaptiveRegularizationState(M, sub_problem [, sub_state]; kwargs...)` replaces + the (anyways unused) variant to only provide the objective; both `X` and `p` moved to keyword arguments. + * `AugmentedLagrangianMethodState(M, objective, sub_problem; evaluation=...)` was added + * ``AugmentedLagrangianMethodState(M, objective, sub_problem, sub_state; evaluation=...)` now has `p=rand(M)` as keyword argument instead of being the second positional one + * `ExactPenaltyMethodState(M, sub_problem; evaluation=...)` was added and `ExactPenaltyMethodState(M, sub_problem, sub_state; evaluation=...)` now has `p=rand(M)` as keyword argument instead of being the second positional one + * `DifferenceOfConvexState(M, sub_problem; evaluation=...)` was added and `DifferenceOfConvexState(M, sub_problem, sub_state; evaluation=...)` now has `p=rand(M)` as keyword argument instead of being the second positional one + * `DifferenceOfConvexProximalState(M, sub_problem; evaluation=...)` was added and `DifferenceOfConvexProximalState(M, sub_problem, sub_state; evaluation=...)` now has `p=rand(M)` as keyword argument instead of being the second positional one + * bumped `Manifolds.jl`to version 0.10; this mainly means that any algorithm working on a productmanifold and requiring `ArrayPartition` now has to explicitly do `using RecursiveArrayTools`. +### Fixed + +* the `AverageGradientRule` filled its internal vector of gradients wrongly – or mixed it up in parallel transport. This is now fixed. + +### Removed + +* the `convex_bundle_method` and its `ConvexBundleMethodState` no longer accept the keywords `k_size`, `p_estimate` nor `ϱ`, they are superseded by just providing `k_max`. +* the `truncated_conjugate_gradient_descent(M, f, grad_f, hess_f)` has the Hessian now + a mandatory argument. To use the old variant, + provide `ApproxHessianFiniteDifference(M, copy(M, p), grad_f)` to `hess_f` directly. +* all deprecated keyword arguments and a few function signatures were removed: + * `get_equality_constraints`, `get_equality_constraints!`, `get_inequality_constraints`, `get_inequality_constraints!` are removed. Use their singular forms and set the index to `:` instead. + * `StopWhenChangeLess(ε)` is removed, use ``StopWhenChangeLess(M, ε)` instead to fill for example the retraction properly used to determine the change + * In the `WolfePowellLinesearch` and `WolfeBinaryLinesearch`the `linesearch_stopsize=` keyword is replaced by `stop_when_stepsize_less=` + * `DebugChange` and `RecordChange` had a `manifold=` and a `invretr` keyword that were replaced by the first positional argument `M` and `inverse_retraction_method=`, respectively + * in the `NonlinearLeastSquaresObjective` and `LevenbergMarquardt` the `jacB=` keyword is now called `jacobian_tangent_basis=` + * in `particle_swarm` the `n=` keyword is replaced by `swarm_size=`. + * `update_stopping_criterion!` has been removed and unified with `set_parameter!`. The code adaptions are + * to set a parameter of a stopping criterion, just replace `update_stopping_criterion!(sc, :Val, v)` with `set_parameter!(sc, :Val, v)` + * to update a stopping criterion in a solver state, replace the old `update_stopping_criterion!(state, :Val, v)` tat passed down to the stopping criterion by the explicit pass down with `set_parameter!(state, :StoppingCriterion, :Val, v)` + ## [0.4.69] – August 3, 2024 diff --git a/Project.toml b/Project.toml index 39367819b4..3490fc3f0d 100644 --- a/Project.toml +++ b/Project.toml @@ -1,7 +1,7 @@ name = "Manopt" uuid = "0fc0a36d-df90-57f3-8f93-d78a9fc72bb5" authors = ["Ronny Bergmann "] -version = "0.4.70" +version = "0.5.0" [deps] ColorSchemes = "35d6a980-a343-548e-a6ea-1d62b119f2f4" @@ -27,6 +27,7 @@ LineSearches = "d3d80556-e9d4-5f37-9878-2ab0fcc64255" Manifolds = "1cead3c2-87b3-11e9-0ccd-23c62b72b94e" Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80" QuadraticModels = "f468eda6-eac5-11e8-05a5-ff9e497bcd19" +RecursiveArrayTools = "731186ca-8d62-57ce-b412-fbd966d074cd" RipQP = "1e40b3f8-35eb-4cd8-8edd-3e515bb9de08" [extensions] @@ -34,6 +35,7 @@ ManoptJuMPExt = "JuMP" ManoptLRUCacheExt = "LRUCache" ManoptLineSearchesExt = "LineSearches" ManoptManifoldsExt = "Manifolds" +ManoptRecursiveArrayToolsExt = "RecursiveArrayTools" ManoptRipQPQuadraticModelsExt = ["RipQP", "QuadraticModels"] [compat] @@ -49,21 +51,22 @@ LRUCache = "1.4" LineSearches = "7.2.0" LinearAlgebra = "1.6" ManifoldDiff = "0.3.8" -Manifolds = "0.9.11" +Manifolds = "0.9.11, 0.10" ManifoldsBase = "0.15.10" -ManoptExamples = "0.1.4" +ManoptExamples = "0.1.8" Markdown = "1.6" Plots = "1.30" Preferences = "1.4" Printf = "1.6" QuadraticModels = "0.9" Random = "1.6" +RecursiveArrayTools = "2, 3" Requires = "0.5, 1" RipQP = "0.6.4" SparseArrays = "1.6" Statistics = "1.6" Test = "1.6" -julia = "1.6" +julia = "1.8" [extras] Aqua = "4c88cf16-eb10-579e-8560-4a9242c79595" @@ -76,8 +79,9 @@ Manifolds = "1cead3c2-87b3-11e9-0ccd-23c62b72b94e" ManoptExamples = "5b8d5e80-5788-45cb-83d6-5e8f1484217d" Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80" QuadraticModels = "f468eda6-eac5-11e8-05a5-ff9e497bcd19" +RecursiveArrayTools = "731186ca-8d62-57ce-b412-fbd966d074cd" RipQP = "1e40b3f8-35eb-4cd8-8edd-3e515bb9de08" Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40" [targets] -test = ["Test", "Aqua", "ForwardDiff", "JuMP", "Manifolds", "ManoptExamples", "ManifoldDiff", "Plots", "LineSearches", "LRUCache", "RipQP", "QuadraticModels"] +test = ["Test", "Aqua", "ForwardDiff", "JuMP", "Manifolds", "ManoptExamples", "ManifoldDiff", "Plots", "LineSearches", "LRUCache", "RecursiveArrayTools", "RipQP", "QuadraticModels"] diff --git a/docs/Project.toml b/docs/Project.toml index 908a7c3b21..ed984dac25 100644 --- a/docs/Project.toml +++ b/docs/Project.toml @@ -21,6 +21,7 @@ Manopt = "0fc0a36d-df90-57f3-8f93-d78a9fc72bb5" Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80" QuadraticModels = "f468eda6-eac5-11e8-05a5-ff9e497bcd19" Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c" +RecursiveArrayTools = "731186ca-8d62-57ce-b412-fbd966d074cd" RipQP = "1e40b3f8-35eb-4cd8-8edd-3e515bb9de08" [compat] @@ -39,9 +40,10 @@ JuMP = "1" LRUCache = "1" LineSearches = "7" Literate = "2" -Manifolds = "0.8.81, 0.9" +Manifolds = "0.8.81, 0.9, 0.10" ManifoldsBase = "0.14.12, 0.15" -Manopt = "0.4" +Manopt = "0.4, 0.5" Plots = "1" QuadraticModels = "0.9.6" +RecursiveArrayTools = "2, 3" RipQP = "0.6.4" diff --git a/docs/make.jl b/docs/make.jl index 55da2c0db7..6ce52da2f6 100755 --- a/docs/make.jl +++ b/docs/make.jl @@ -47,6 +47,8 @@ if "--quarto" ∈ ARGS tutorials_folder = (@__DIR__) * "/../tutorials" # instantiate the tutorials environment if necessary Pkg.activate(tutorials_folder) + # For a breaking release -> also set the tutorials folder to the most recent version + Pkg.develop(PackageSpec(; path=(@__DIR__) * "/../")) Pkg.resolve() Pkg.instantiate() Pkg.build("IJulia") # build `IJulia` to the right version. @@ -72,7 +74,7 @@ end # (c) load necessary packages for the docs using Documenter using DocumenterCitations, DocumenterInterLinks -using JuMP, LineSearches, LRUCache, Manopt, Manifolds, Plots +using JuMP, LineSearches, LRUCache, Manopt, Manifolds, Plots, RecursiveArrayTools using RipQP, QuadraticModels # (d) add contributing.md to docs diff --git a/docs/src/about.md b/docs/src/about.md index 554307fc8f..d7301eec89 100644 --- a/docs/src/about.md +++ b/docs/src/about.md @@ -8,13 +8,13 @@ This Julia package was started and is currently maintained by [Ronny Bergmann](h Thanks to the following contributors to `Manopt.jl`: * [Constantin Ahlmann-Eltze](https://const-ae.name) implemented the [gradient and differential `check` functions](helpers/checks.md) -* [Renée Dornig](https://github.com/r-dornig) implemented the [particle swarm](solvers/particle_swarm.md), the [Riemannian Augmented Lagrangian Method](solvers/augmented_Lagrangian_method.md), the [Exact Penalty Method](solvers/exact_penalty_method.md), as well as the [`NonmonotoneLinesearch`](@ref) +* [Renée Dornig](https://github.com/r-dornig) implemented the [particle swarm](solvers/particle_swarm.md), the [Riemannian Augmented Lagrangian Method](solvers/augmented_Lagrangian_method.md), the [Exact Penalty Method](solvers/exact_penalty_method.md), as well as the [`NonmonotoneLinesearch`](@ref). These solvers are also the first one with modular/exchangable sub solvers. * [Willem Diepeveen](https://www.maths.cam.ac.uk/person/wd292) implemented the [primal-dual Riemannian semismooth Newton](solvers/primal_dual_semismooth_Newton.md) solver. -* [Hajg Jasa](https://www.ntnu.edu/employees/hajg.jasa) implemented the [convex bundle method](solvers/convex_bundle_method.md) and the [proximal bundle method](solvers/proximal_bundle_method.md). -* Even Stephansen Kjemsås contributed to the implementation of the [Frank Wolfe Method](solvers/FrankWolfe.md) solver -* Mathias Ravn Munkvold contributed most of the implementation of the [Adaptive Regularization with Cubics](solvers/adaptive-regularization-with-cubics.md) solver -* [Tom-Christian Riemer](https://www.tu-chemnitz.de/mathematik/wire/mitarbeiter.php) implemented the [trust regions](solvers/trust_regions.md) and [quasi Newton](solvers/quasi_Newton.md) solvers. -* [Markus A. Stokkenes](https://www.linkedin.com/in/markus-a-stokkenes-b41bba17b/) contributed most of the implementation of the [Interior Point Newton Method](solvers/interior_point_Newton.md) +* [Hajg Jasa](https://www.ntnu.edu/employees/hajg.jasa) implemented the [convex bundle method](solvers/convex_bundle_method.md) and the [proximal bundle method](solvers/proximal_bundle_method.md) and a default subsolver each of them. +* Even Stephansen Kjemsås contributed to the implementation of the [Frank Wolfe Method](solvers/FrankWolfe.md) solver. +* Mathias Ravn Munkvold contributed most of the implementation of the [Adaptive Regularization with Cubics](solvers/adaptive-regularization-with-cubics.md) solver as well as ist [Lanczos](@ref arc-Lanczos) subsolver +* [Tom-Christian Riemer](https://www.tu-chemnitz.de/mathematik/wire/mitarbeiter.php) implemented the [trust regions](solvers/trust_regions.md) and [quasi Newton](solvers/quasi_Newton.md) solvers as well as the [truncated conjugate gradient descent](solvers/truncated_conjugate_gradient_descent.md) subsolver. +* [Markus A. Stokkenes](https://www.linkedin.com/in/markus-a-stokkenes-b41bba17b/) contributed most of the implementation of the [Interior Point Newton Method](solvers/interior_point_Newton.md) as well as its default [Conjugate Residual](solvers/conjugate_residual.md) subsolver * [Manuel Weiss](https://scoop.iwr.uni-heidelberg.de/author/manuel-weiß/) implemented most of the [conjugate gradient update rules](@ref cg-coeffs) as well as various [contributors](https://github.com/JuliaManifolds/Manopt.jl/graphs/contributors) providing small extensions, finding small bugs and mistakes and fixing them by opening [PR](https://github.com/JuliaManifolds/Manopt.jl/pulls)s. Thanks to all of you. diff --git a/docs/src/plans/index.md b/docs/src/plans/index.md index 5097c770ef..a408ba1de3 100644 --- a/docs/src/plans/index.md +++ b/docs/src/plans/index.md @@ -16,8 +16,8 @@ By splitting these two parts, one problem can be define an then be solved using Still there might be the need to set certain parameters within any of these structures. For that there is ```@docs -set_manopt_parameter! -get_manopt_parameter +set_parameter! +get_parameter Manopt.status_summary ``` @@ -52,8 +52,17 @@ often also with `:p` and similarly the gradient with `:X`. This is discouraged for both readability as well as to stay more generic, and it is recommended to use `:Iterate` and `:Gradient` instead in generic settings. -You can further activate a “Tutorial” mode by `set_manopt_parameter!(:Mode, "Tutorial")`. Internally, the following convenience function is available. +You can further activate a “Tutorial” mode by `set_parameter!(:Mode, "Tutorial")`. Internally, the following convenience function is available. ```@docs Manopt.is_tutorial_mode +``` + +# A factory for providing manifold defaults + +In several cases a manifold might not yet be known at the time a (keyword) argument should be provided. Therefore, any type with a manifold default can be wrapped into a factory. + +```@docs +Manopt.ManifoldDefaultsFactory +Manopt._produce_type ``` \ No newline at end of file diff --git a/docs/src/plans/stepsize.md b/docs/src/plans/stepsize.md index b2862f2f88..02eca3fe71 100644 --- a/docs/src/plans/stepsize.md +++ b/docs/src/plans/stepsize.md @@ -19,11 +19,15 @@ on the manifold currently under consideration. Currently, the following step sizes are available -```@autodocs -Modules = [Manopt] -Pages = ["plans/stepsize.jl"] -Order = [:type,:function] -Filter = t -> t != Stepsize +```@docs +AdaptiveWNGradient +ArmijoLinesearch +ConstantLength +DecreasingLength +NonmonotoneLinesearch +Polyak +WolfePowellLinesearch +WolfePowellBinaryLinesearch ``` Some step sizes use [`max_stepsize`](@ref) function as a rough upper estimate for the trust region size. @@ -33,8 +37,20 @@ Tangent bundle with the Sasaki metric has 0 injectivity radius, so the maximum s `Hyperrectangle` also has 0 injectivity radius and an estimate based on maximum of dimensions along each index is used instead. For manifolds with corners, however, a line search capable of handling break points along the projected search direction should be used, and such algorithms do not call `max_stepsize`. -Some solvers have a different iterate from the one used for the line search. Then the following state can be used to wrap -these locally +Internally the step size functions above create a [`ManifoldDefaultsFactory`](@ref). +Internally these use + +```@autodocs +Modules = [Manopt] +Pages = ["plans/stepsize.jl"] +Private = true +Order = [:function, :type] +Filter = t -> !(t in [Stepsize, AdaptiveWNGradient, ArmijoLinesearch, ConstantLength, DecreasingLength, NonmonotoneLinesearch, Polyak, WolfePowellLinesearch, WolfePowellBinaryLinesearch ]) +``` + + +Some solvers have a different iterate from the one used for the line search. +Then the following state can be used to wrap these locally ```@docs StepsizeState diff --git a/docs/src/solvers/adaptive-regularization-with-cubics.md b/docs/src/solvers/adaptive-regularization-with-cubics.md index 4b6ff3ebc1..7f499f6450 100644 --- a/docs/src/solvers/adaptive-regularization-with-cubics.md +++ b/docs/src/solvers/adaptive-regularization-with-cubics.md @@ -21,7 +21,7 @@ AdaptiveRegularizationState There are several ways to approach the subsolver. The default is the first one. -## Lanczos iteration +## [Lanczos iteration](@id arc-Lanczos) ```@docs Manopt.LanczosState diff --git a/docs/src/solvers/alternating_gradient_descent.md b/docs/src/solvers/alternating_gradient_descent.md index 3dafd303a0..35c96b33ff 100644 --- a/docs/src/solvers/alternating_gradient_descent.md +++ b/docs/src/solvers/alternating_gradient_descent.md @@ -21,8 +21,10 @@ The most inner one should always be the following one though. ```@docs AlternatingGradient +Manopt.AlternatingGradientRule ``` +which internally uses ## [Technical details](@id sec-agd-technical-details) diff --git a/docs/src/solvers/conjugate_gradient_descent.md b/docs/src/solvers/conjugate_gradient_descent.md index 7126026c8a..9986c512e4 100644 --- a/docs/src/solvers/conjugate_gradient_descent.md +++ b/docs/src/solvers/conjugate_gradient_descent.md @@ -21,15 +21,28 @@ ConjugateGradientDescentState The update rules act as [`DirectionUpdateRule`](@ref), which internally always first evaluate the gradient itself. ```@docs -ConjugateGradientBealeRestart ConjugateDescentCoefficient +ConjugateGradientBealeRestart DaiYuanCoefficient FletcherReevesCoefficient HagerZhangCoefficient HestenesStiefelCoefficient LiuStoreyCoefficient PolakRibiereCoefficient -SteepestDirectionUpdateRule +SteepestDescentCoefficient +``` + +## Internal rule storages + +```@docs +Manopt.ConjugateGradientBealeRestartRule +Manopt.DaiYuanCoefficientRule +Manopt.FletcherReevesCoefficientRule +Manopt.HagerZhangCoefficientRule +Manopt.HestenesStiefelCoefficientRule +Manopt.LiuStoreyCoefficientRule +Manopt.PolakRibiereCoefficientRule +Manopt.SteepestDescentCoefficientRule ``` ## [Technical details](@id sec-cgd-technical-details) diff --git a/docs/src/solvers/conjugate_residual.md b/docs/src/solvers/conjugate_residual.md index 0ed7f404b7..7723aeff02 100644 --- a/docs/src/solvers/conjugate_residual.md +++ b/docs/src/solvers/conjugate_residual.md @@ -6,6 +6,7 @@ CurrentModule = Manopt ```@docs conjugate_residual +conjugate_residual! ``` ## State diff --git a/docs/src/solvers/gradient_descent.md b/docs/src/solvers/gradient_descent.md index e7638ca977..98758a0bd7 100644 --- a/docs/src/solvers/gradient_descent.md +++ b/docs/src/solvers/gradient_descent.md @@ -5,8 +5,8 @@ CurrentModule = Manopt ``` ```@docs - gradient_descent - gradient_descent! +gradient_descent +gradient_descent! ``` ## State @@ -20,13 +20,23 @@ GradientDescentState A field of the options is the `direction`, a [`DirectionUpdateRule`](@ref), which by default [`IdentityUpdateRule`](@ref) just evaluates the gradient but can be enhanced for example to ```@docs +AverageGradient DirectionUpdateRule IdentityUpdateRule MomentumGradient -AverageGradient Nesterov ``` +which internally use the [`ManifoldDefaultsFactory`](@ref) and produce the internal +elements + +```@docs +Manopt.AverageGradientRule +Manopt.ConjugateDescentCoefficientRule +Manopt.MomentumGradientRule +Manopt.NesterovRule +``` + ## Debug actions ```@docs diff --git a/docs/src/solvers/stochastic_gradient_descent.md b/docs/src/solvers/stochastic_gradient_descent.md index 5d9888e6e3..a7637f573b 100644 --- a/docs/src/solvers/stochastic_gradient_descent.md +++ b/docs/src/solvers/stochastic_gradient_descent.md @@ -13,6 +13,7 @@ stochastic_gradient_descent! ```@docs StochasticGradientDescentState +Manopt.default_stepsize(::AbstractManifold, ::Type{StochasticGradientDescentState}) ``` Additionally, the options share a [`DirectionUpdateRule`](@ref), @@ -20,10 +21,16 @@ so you can also apply [`MomentumGradient`](@ref) and [`AverageGradient`](@ref) h The most inner one should always be. ```@docs -AbstractGradientGroupProcessor StochasticGradient ``` +which internally uses + +```@docs +AbstractGradientGroupDirectionRule +StochasticGradientRule +``` + ## [Technical details](@id sec-sgd-technical-details) The [`stochastic_gradient_descent`](@ref) solver requires the following functions of a manifold to be available diff --git a/docs/src/solvers/truncated_conjugate_gradient_descent.md b/docs/src/solvers/truncated_conjugate_gradient_descent.md index a873638199..aabcacd116 100644 --- a/docs/src/solvers/truncated_conjugate_gradient_descent.md +++ b/docs/src/solvers/truncated_conjugate_gradient_descent.md @@ -34,8 +34,8 @@ StopWhenResidualIsReducedByFactorOrPower StopWhenTrustRegionIsExceeded StopWhenCurvatureIsNegative StopWhenModelIncreased -update_stopping_criterion!(::StopWhenResidualIsReducedByFactorOrPower, ::Val{:ResidualPower}, ::Any) -update_stopping_criterion!(::StopWhenResidualIsReducedByFactorOrPower, ::Val{:ResidualFactor}, ::Any) +Manopt.set_parameter!(::StopWhenResidualIsReducedByFactorOrPower, ::Val{:ResidualPower}, ::Any) +Manopt.set_parameter!(::StopWhenResidualIsReducedByFactorOrPower, ::Val{:ResidualFactor}, ::Any) ``` ## Trust region model diff --git a/ext/ManoptJuMPExt.jl b/ext/ManoptJuMPExt.jl index 2f75604c10..5304a52a12 100644 --- a/ext/ManoptJuMPExt.jl +++ b/ext/ManoptJuMPExt.jl @@ -353,7 +353,7 @@ function MOI.optimize!(model::Optimizer) kws = Dict{Symbol,Any}( Symbol(key) => value for (key, value) in model.options if key != DESCENT_STATE_TYPE ) - s = descent_state_type(model.manifold, reshaped_start; kws...) + s = descent_state_type(model.manifold; p=reshaped_start, kws...) model.state = decorate_state!(s) solve!(model.problem, model.state) return nothing diff --git a/ext/ManoptLRUCacheExt.jl b/ext/ManoptLRUCacheExt.jl index 6e2274f3cb..f416f50f42 100644 --- a/ext/ManoptLRUCacheExt.jl +++ b/ext/ManoptLRUCacheExt.jl @@ -27,7 +27,7 @@ Given a vector of symbols `caches`, this function sets up the # Keyword arguments -* `p=`$(Manopt._link_rand()): a point on a manifold, to both infer its type for keys and initialize caches +* `p=`$(Manopt._link(:rand)): a point on a manifold, to both infer its type for keys and initialize caches * `value=0.0`: a value both typing and initialising number-caches, the default is for (Float) values like the cost. * `X=zero_vector(M, p)`: diff --git a/ext/ManoptManifoldsExt/ChambollePockManifolds.jl b/ext/ManoptManifoldsExt/ChambollePockManifolds.jl index 3ff7ec25b9..1710eb3d7f 100644 --- a/ext/ManoptManifoldsExt/ChambollePockManifolds.jl +++ b/ext/ManoptManifoldsExt/ChambollePockManifolds.jl @@ -1,11 +1,3 @@ -function Manopt.ChambollePockState( - M::AbstractManifold, - m::P, - n::Q, - p::P, - X::T; - N::AbstractManifold=TangentBundle(M), - kwargs..., -) where {P,Q,T} - return ChambollePockState(M, N, m, n, p, X; kwargs...) +function Manopt.ChambollePockState(M::AbstractManifold; kwargs...) + return ChambollePockState(M, TangentBundle(M); kwargs...) end diff --git a/ext/ManoptManifoldsExt/ManoptManifoldsExt.jl b/ext/ManoptManifoldsExt/ManoptManifoldsExt.jl index fc62c4ea4e..405c299111 100644 --- a/ext/ManoptManifoldsExt/ManoptManifoldsExt.jl +++ b/ext/ManoptManifoldsExt/ManoptManifoldsExt.jl @@ -2,19 +2,12 @@ module ManoptManifoldsExt using ManifoldsBase: exp, log, ParallelTransport, vector_transport_to using Manopt -using Manopt: - _l_refl, - _l_retr, - _kw_retraction_method_default, - _kw_inverse_retraction_method_default, - _kw_X_default +using Manopt: _tex, _var, ManifoldDefaultsFactory, _produce_type import Manopt: max_stepsize, - alternating_gradient_descent, - alternating_gradient_descent!, get_gradient, get_gradient!, - set_manopt_parameter!, + set_parameter!, reflect, reflect!, Rn, @@ -37,5 +30,4 @@ Rn(::Val{:Manifolds}, args...; kwargs...) = Euclidean(args...; kwargs...) const NONMUTATINGMANIFOLDS = Union{Circle,PositiveNumbers,Euclidean{Tuple{}}} include("manifold_functions.jl") include("ChambollePockManifolds.jl") -include("alternating_gradient.jl") end diff --git a/ext/ManoptManifoldsExt/manifold_functions.jl b/ext/ManoptManifoldsExt/manifold_functions.jl index f921aad698..1c3d4332f2 100644 --- a/ext/ManoptManifoldsExt/manifold_functions.jl +++ b/ext/ManoptManifoldsExt/manifold_functions.jl @@ -115,24 +115,22 @@ end Reflect the point `x` from the manifold `M` at point `p`, given by ```math -$_l_refl +$(_tex(:reflect)) ``` -where ``$_l_retr`` and ``$_l_retr^{-1}`` denote a retraction and an inverse +where ``$(_tex(:retr))`` and ``$(_tex(:invretr))`` denote a retraction and an inverse retraction, respectively. This can also be done in place of `q`. ## Keyword arguments -* $_kw_retraction_method_default - the retraction to use in the reflection -* $_kw_inverse_retraction_method_default - the inverse retraction to use within the reflection +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :inverse_retraction_method)) and for the `reflect!` additionally -* $_kw_X_default - a temporary memory to compute the inverse retraction in place. +$(_var(:Keyword, :X)) + as temporary memory to compute the inverse retraction in place. otherwise this is the memory that would be allocated anyways. """ function reflect( diff --git a/ext/ManoptManifoldsExt/alternating_gradient.jl b/ext/ManoptRecursiveArrayToolsExt.jl similarity index 84% rename from ext/ManoptManifoldsExt/alternating_gradient.jl rename to ext/ManoptRecursiveArrayToolsExt.jl index 252f916419..aa627e7923 100644 --- a/ext/ManoptManifoldsExt/alternating_gradient.jl +++ b/ext/ManoptRecursiveArrayToolsExt.jl @@ -1,3 +1,21 @@ +module ManoptRecursiveArrayToolsExt +using Manopt +using ManifoldsBase +using ManifoldsBase: submanifold_components +import Manopt: + max_stepsize, + alternating_gradient_descent, + alternating_gradient_descent!, + get_gradient, + get_gradient!, + set_parameter! +using Manopt: _tex, _var, ManifoldDefaultsFactory, _produce_type + +if isdefined(Base, :get_extension) + using RecursiveArrayTools +else + using ..RecursiveArrayTools +end @doc raw""" X = get_gradient(M::ProductManifold, ago::ManifoldAlternatingGradientObjective, p) @@ -88,7 +106,9 @@ function alternating_gradient_descent!( inner_iterations::Int=5, stopping_criterion::StoppingCriterion=StopAfterIteration(100) | StopWhenGradientNormLess(1e-9), - stepsize::Stepsize=default_stepsize(M, AlternatingGradientDescentState), + stepsize::Union{Stepsize,ManifoldDefaultsFactory}=default_stepsize( + M, AlternatingGradientDescentState + ), order_type::Symbol=:Linear, order=collect(1:length(M.manifolds)), retraction_method::AbstractRetractionMethod=default_retraction_method(M, typeof(p)), @@ -97,11 +117,11 @@ function alternating_gradient_descent!( dagmo = Manopt.decorate_objective!(M, agmo; kwargs...) dmp = DefaultManoptProblem(M, dagmo) agds = AlternatingGradientDescentState( - M, - p; + M; + p=p, inner_iterations=inner_iterations, stopping_criterion=stopping_criterion, - stepsize=stepsize, + stepsize=_produce_type(stepsize, M), order_type=order_type, order=order, retraction_method=retraction_method, @@ -110,3 +130,4 @@ function alternating_gradient_descent!( Manopt.solve!(dmp, agds) return Manopt.get_solver_return(get_objective(dmp), agds) end +end diff --git a/joss/paper.md b/joss/paper.md index 6b909c718a..21cab8dabc 100644 --- a/joss/paper.md +++ b/joss/paper.md @@ -65,7 +65,7 @@ In the current version 0.3.17 of `Manopt.jl` the following algorithms are availa * Alternating Gradient Descent ([`alternating_gradient_descent`](https://manoptjl.org/v0.3/solvers/alternating_gradient_descent.html)) * Chambolle-Pock ([`ChambollePock`](https://manoptjl.org/v0.3/solvers/ChambollePock.html)) [@BergmannHerzogSilvaLouzeiroTenbrinckVidalNunez:2021:1] * Conjugate Gradient Descent ([`conjugate_gradient_descent`](https://manoptjl.org/v0.3/solvers/conjugate_gradient_descent.html)), which includes eight direction update rules using the `coefficient` keyword: - [`SteepestDirectionUpdateRule`](https://manoptjl.org/v0.3/solvers/conjugate_gradient_descent.html#Manopt.SteepestDirectionUpdateRule), [`ConjugateDescentCoefficient`](https://manoptjl.org/v0.3/solvers/conjugate_gradient_descent.html#Manopt.ConjugateDescentCoefficient). [`DaiYuanCoefficient`](https://manoptjl.org/v0.3/solvers/conjugate_gradient_descent.html#Manopt.DaiYuanCoefficient), [`FletcherReevesCoefficient`](https://manoptjl.org/v0.3/solvers/conjugate_gradient_descent.html#Manopt.FletcherReevesCoefficient), [`HagerZhangCoefficient`](https://manoptjl.org/v0.3/solvers/conjugate_gradient_descent.html#Manopt.HagerZhangCoefficient), [`HeestenesStiefelCoefficient`](https://manoptjl.org/v0.3/solvers/conjugate_gradient_descent.html#Manopt.HeestenesStiefelCoefficient), [`LiuStoreyCoefficient`](https://manoptjl.org/v0.3/solvers/conjugate_gradient_descent.html#Manopt.LiuStoreyCoefficient), and [`PolakRibiereCoefficient`](https://manoptjl.org/v0.3/solvers/conjugate_gradient_descent.html#Manopt.PolakRibiereCoefficient) + [`SteepestDirectionUpdateRule`](https://manoptjl.org/v0.3/solvers/conjugate_gradient_descent.html#Manopt.SteepestDirectionUpdateRule), [`ConjugateDescentCoefficient`](https://manoptjl.org/v0.3/solvers/conjugate_gradient_descent.html#Manopt.ConjugateDescentCoefficient). [`DaiYuanCoefficientRule`](https://manoptjl.org/v0.3/solvers/conjugate_gradient_descent.html#Manopt.DaiYuanCoefficientRule), [`FletcherReevesCoefficient`](https://manoptjl.org/v0.3/solvers/conjugate_gradient_descent.html#Manopt.FletcherReevesCoefficient), [`HagerZhangCoefficient`](https://manoptjl.org/v0.3/solvers/conjugate_gradient_descent.html#Manopt.HagerZhangCoefficient), [`HeestenesStiefelCoefficient`](https://manoptjl.org/v0.3/solvers/conjugate_gradient_descent.html#Manopt.HeestenesStiefelCoefficient), [`LiuStoreyCoefficient`](https://manoptjl.org/v0.3/solvers/conjugate_gradient_descent.html#Manopt.LiuStoreyCoefficient), and [`PolakRibiereCoefficient`](https://manoptjl.org/v0.3/solvers/conjugate_gradient_descent.html#Manopt.PolakRibiereCoefficient) * Cyclic Proximal Point ([`cyclic_proximal_point`](https://manoptjl.org/v0.3/solvers/cyclic_proximal_point.html)) [@Bacak:2014:1] * (parallel) Douglas—Rachford ([`DouglasRachford`](https://manoptjl.org/v0.3/solvers/DouglasRachford.html)) [@BergmannPerschSteidl:2016:1] * Gradient Descent ([`gradient_descent`](https://manoptjl.org/v0.3/solvers/gradient_descent.html)), including direction update rules ([`IdentityUpdateRule`](https://manoptjl.org/v0.3/solvers/gradient_descent.html#Manopt.IdentityUpdateRule) for the classical gradient descent) to perform [`MomentumGradient`](https://manoptjl.org/v0.3/solvers/gradient_descent.html#Manopt.MomentumGradient), [`AverageGradient`](https://manoptjl.org/v0.3/solvers/gradient_descent.html#Manopt.AverageGradient), and [`Nesterov`](https://manoptjl.org/v0.3/solvers/gradient_descent.html#Manopt.Nesterov) types diff --git a/src/Manopt.jl b/src/Manopt.jl index b0c7043a62..a015e5a3a8 100644 --- a/src/Manopt.jl +++ b/src/Manopt.jl @@ -149,6 +149,8 @@ using Requires using SparseArrays using Statistics +include("documentation_glossary.jl") + """ Rn(args; kwargs...) Rn(s::Symbol=:Manifolds, args; kwargs...) @@ -251,7 +253,7 @@ function __init__() # # Error Hints # - @static if isdefined(Base.Experimental, :register_error_hint) + @static if isdefined(Base.Experimental, :register_error_hint) # COV_EXCL_LINE Base.Experimental.register_error_hint(MethodError) do io, exc, argtypes, kwargs if exc.f === convex_bundle_method_subsolver print( @@ -272,13 +274,16 @@ function __init__() # # Requires fallback for Julia < 1.9 # - @static if !isdefined(Base, :get_extension) + @static if !isdefined(Base, :get_extension) # COV_EXCL_LINE @require JuMP = "4076af6c-e467-56ae-b986-b466b2749572" begin include("../ext/ManoptJuMPExt.jl") end @require Manifolds = "1cead3c2-87b3-11e9-0ccd-23c62b72b94e" begin include("../ext/ManoptManifoldsExt/ManoptManifoldsExt.jl") end + @require RecursiveArrayTools = "731186ca-8d62-57ce-b412-fbd966d074cd" begin + include("../ext/ManoptRecursiveArrayToolsExt.jl") + end @require LineSearches = "d3d80556-e9d4-5f37-9878-2ab0fcc64255" begin include("../ext/ManoptLineSearchesExt.jl") end @@ -448,10 +453,9 @@ export has_storage, get_storage, update_storage! export objective_cache_factory # # Direction Update Rules -export DirectionUpdateRule, - IdentityUpdateRule, StochasticGradient, AverageGradient, MomentumGradient, Nesterov -export DirectionUpdateRule, - SteepestDirectionUpdateRule, +export DirectionUpdateRule +export Gradient, StochasticGradient, AverageGradient, MomentumGradient, Nesterov +export SteepestDescentCoefficient, HestenesStiefelCoefficient, FletcherReevesCoefficient, PolakRibiereCoefficient, @@ -527,7 +531,7 @@ export SmoothingTechnique, LinearQuadraticHuber, LogarithmicSumOfExponentials # # Stepsize export Stepsize -export AdaptiveWNGradient, ConstantStepsize, DecreasingStepsize, PolyakStepsize +export AdaptiveWNGradient, ConstantLength, DecreasingLength, Polyak export ArmijoLinesearch, Linesearch, NonmonotoneLinesearch export get_stepsize, get_initial_stepsize, get_last_stepsize export InteriorPointCentralityCondition @@ -567,7 +571,6 @@ export StopAfter, StopWhenTrustRegionIsExceeded export get_active_stopping_criteria, get_stopping_criteria, get_reason, get_stopping_criterion -export update_stopping_criterion! # # Exports export asymptote_export_S2_signals, asymptote_export_S2_data, asymptote_export_SPD diff --git a/src/deprecated.jl b/src/deprecated.jl index 03e7a566d9..8b13789179 100644 --- a/src/deprecated.jl +++ b/src/deprecated.jl @@ -1,15 +1 @@ -Base.@deprecate_binding HeestenesStiefelCoefficient HestenesStiefelCoefficient -export HeestenesStiefelCoefficient -Base.@deprecate_binding SimpleCacheObjective SimpleManifoldCachedObjective -export SimpleCacheObjective -# -# Deprecated tCG calls -# -@deprecate truncated_conjugate_gradient_descent( - M::AbstractManifold, F, gradF, x, Y, H::TH; kwargs... -) where {TH<:Function} truncated_conjugate_gradient_descent(M, F, gradF, H, x, Y; kwargs...) -@deprecate truncated_conjugate_gradient_descent!( - M::AbstractManifold, F::TF, gradF::TG, x, Y, H::TH; kwargs... -) where {TF<:Function,TG<:Function,TH<:Function} truncated_conjugate_gradient_descent!( - M, F, gradF, H, x, Y; kwargs... -) + diff --git a/src/documentation_glossary.jl b/src/documentation_glossary.jl new file mode 100644 index 0000000000..b30b25f268 --- /dev/null +++ b/src/documentation_glossary.jl @@ -0,0 +1,492 @@ +# +# Manopt Glossary +# === +# +# This file collects +# * LaTeX snippets +# * math formulae +# * Variable names +# * links +# * notes +# +# to keep naming, notation, and formatting + +# In general every dictionary here can be either :Symbol-> String or :Symbol -> Dictionary enrties + +_MANOPT_DOC_TYPE = Dict{Symbol,Union{String,Dict,Function}} + +_manopt_glossary = _MANOPT_DOC_TYPE() + +# easier access functions +""" + glossary(s::Symbol, args...; kwargs...) + glossary(g::Dict, s::Symbol, args...; kwargs...) + +Access an entry in the glossary at `Symbol` s +if that entrs is +* a string, this is returned +* a function, it is called with `args...` and `kwargs...` passed +* a dictionary, then the arguments and keyword arguments are passed to this dictionary, assuming `args[1]` is a symbol +""" +#do not document for now, until we have an internals section +glossary(s::Symbol, args...; kwargs...) = glossary(_manopt_glossary, s, args...; kwargs...) +function glossary(g::_MANOPT_DOC_TYPE, s::Symbol, args...; kwargs...) + return glossary(g[s], args...; kwargs...) +end +glossary(s::String, args...; kwargs...) = s +glossary(f::Function, args...; kwargs...) = f(args...; kwargs...) + +define!(s::Symbol, args...) = define!(_manopt_glossary, s, args...) +function define!(g::_MANOPT_DOC_TYPE, s::Symbol, e::Union{String,Function}) + g[s] = e + return g +end +function define!(g::_MANOPT_DOC_TYPE, s1::Symbol, s2::Symbol, args...) + !(haskey(g, s1)) && (g[s1] = _MANOPT_DOC_TYPE()) + define!(g[s1], s2, args...) + return g +end + +# --- +# LaTeX + +define!(:LaTeX, :argmin, raw"\operatorname{arg\,min}") +define!(:LaTeX, :ast, raw"\ast") +define!(:LaTeX, :bar, (letter) -> raw"\bar" * "$(letter)") +define!(:LaTeX, :big, raw"\big") +define!(:LaTeX, :bigl, raw"\bigl") +define!(:LaTeX, :bigr, raw"\bigr") +define!(:LaTeX, :Big, raw"\Big") +define!(:LaTeX, :Bigl, raw"\Bigl") +define!(:LaTeX, :Bigr, raw"\Bigr") +define!(:LaTeX, :Cal, (letter) -> raw"\mathcal " * "$letter") +define!( + :LaTeX, + :cases, + (c...) -> + raw"\begin{cases}" * + "\n" * + "$(join([" $(ci)" for ci in c], raw"\\\\"*"\n"))" * + "\n" * + raw"\end{cases}", +) +define!(:LaTeX, :deriv, (t = "t") -> raw"\frac{\mathrm{d}}{\mathrm{d}" * "$(t)" * "}") +define!(:LaTeX, :displaystyle, raw"\displaystyle") +define!(:LaTeX, :frac, (a, b) -> raw"\frac" * "{$a}{$b}") +define!(:LaTeX, :grad, raw"\operatorname{grad}") +define!(:LaTeX, :hat, (letter) -> raw"\hat{" * "$letter" * "}") +define!(:LaTeX, :Hess, raw"\operatorname{Hess}") +define!(:LaTeX, :invretr, raw"\operatorname{retr}^{-1}") +define!(:LaTeX, :log, raw"\log") +define!(:LaTeX, :max, raw"\max") +define!(:LaTeX, :min, raw"\min") +define!(:LaTeX, :norm, (v; index = "") -> raw"\lVert " * "$v" * raw" \rVert" * "_{$index}") +define!(:LaTeX, :prox, raw"\operatorname{prox}") +define!(:LaTeX, :quad, raw"\quad") +define!(:LaTeX, :reflect, raw"\operatorname{refl}") +define!(:LaTeX, :retr, raw"\operatorname{retr}") +define!(:LaTeX, :subgrad, raw"∂") +define!(:LaTeX, :text, (letter) -> raw"\text{" * "$letter" * "}") +define!(:LaTeX, :vert, raw"\vert") +define!(:LaTeX, :widehat, (letter) -> raw"\widehat{" * "$letter" * "}") +_tex(args...; kwargs...) = glossary(:LaTeX, args...; kwargs...) +# +# --- +# Mathematics and semantic symbols +# :symbol the symbol, +# :description the description +define!(:Math, :M, (; M="M") -> _math(:Manifold, :symbol; M=M)) +define!(:Math, :Manifold, :symbol, (; M="M") -> _tex(:Cal, M)) +define!(:Math, :Manifold, :descrption, "the Riemannian manifold") +define!(:Math, :Iterate, (; p="p", k="k") -> "$(p)^{($(k))}") +define!( + :Math, + :Sequence, + (var, ind, from, to) -> raw"\{" * "$(var)_$(ind)" * raw"\}" * "_{$(ind)=$from}^{$to}", +) +define!(:Math, :TM, (; M="M") -> _math(:TangentBundle, :symbol; M=M)) +define!(:Math, :TangentBundle, :symbol, (; M="M") -> "T$(_tex(:Cal, M))") +define!( + :Math, + :TangentBundle, + :description, + (; M="M") -> "the tangent bundle of the manifold ``$(_math(:M; M=M))``", +) +define!(:Math, :TpM, (; M="M", p="p") -> _math(:TangentSpace, :symbol; M=M, p=p)) +define!(:Math, :TangentSpace, :symbol, (; M="M", p="p") -> "T_{$p}$(_tex(:Cal, M))") +define!( + :Math, + :TangentSpace, + :description, + (; M="M", p="p") -> + "the tangent space at the point ``p`` on the manifold ``$(_math(:M; M=M))``", +) +define!( + :Math, :vector_transport, :symbol, (a="⋅", b="⋅") -> raw"\mathcal T_{" * "$a←$b" * "}" +) +define!(:Math, :vector_transport, :name, "the vector transport") +_math(args...; kwargs...) = glossary(:Math, args...; kwargs...) + +# +# --- +# Links +# Collect short forms for links, especially Interdocs ones. +_link(args...; kwargs...) = glossary(:Link, args...; kwargs...) + +define!( + :Link, + :AbstractManifold, + "[`AbstractManifold`](@extref `ManifoldsBase.AbstractManifold`)", +) +define!( + :Link, + :manifold_dimension, + (; M="M") -> + "[`manifold_dimension`](@extref `ManifoldsBase.manifold_dimension-Tuple{AbstractManifold}`)$(length(M) > 0 ? "`($M)`" : "")", +) +define!(:Link, :Manopt, "[`Manopt.jl`](https://manoptjl.org)") +define!( + :Link, :Manifolds, "[`Manifolds.jl`](https://juliamanifolds.github.io/Manifolds.jl/)" +) +define!( + :Link, + :ManifoldsBase, + "[`ManifoldsBase.jl`](https://juliamanifolds.github.io/ManifoldsBase.jl/)", +) +define!( + :Link, + :rand, + (; M="M") -> + "[`rand`](@extref Base.rand-Tuple{AbstractManifold})$(length(M) > 0 ? "`($M)`" : "")", +) +define!( + :Link, + :zero_vector, + (; M="M", p="p") -> + "[`zero_vector`](@extref `ManifoldsBase.zero_vector-Tuple{AbstractManifold, Any}`)$(length(M) > 0 ? "`($M, $p)`" : "")", +) + +# +# +# Notes / Remarks +_note(args...; kwargs...) = glossary(:Note, args...; kwargs...) +define!( + :Note, + :ManifoldDefaultFactory, + (type::String) -> """ +!!! info + This function generates a [`ManifoldDefaultsFactory`](@ref) for [`$(type)`](@ref). + For default values, that depend on the manifold, this factory postpones the construction + until the manifold from for example a corresponding [`AbstractManoptSolverState`](@ref) is available. +""", +) +define!( + :Note, + :GradientObjective, + (; objective="gradient_objective", f="f", grad_f="grad_f") -> """ +Alternatively to `$f` and `$grad_f` you can provide +the corresponding [`AbstractManifoldGradientObjective`](@ref) `$objective` directly. +""", +) +define!( + :Note, + :OtherKeywords, + "All other keyword arguments are passed to [`decorate_state!`](@ref) for state decorators or [`decorate_objective!`](@ref) for objective decorators, respectively.", +) +define!( + :Note, + :OutputSection, + (; p_min="p^*") -> """ +# Output + +The obtained approximate minimizer ``$(p_min)``. +To obtain the whole final state of the solver, see [`get_solver_return`](@ref) for details, especially the `return_state=` keyword. +""", +) +define!( + :Note, + :TutorialMode, + "If you activate tutorial mode (cf. [`is_tutorial_mode`](@ref)), this solver provides additional debug warnings.", +) +define!( + :Note, + :KeywordUsedIn, + function (kw::String) + return "This is used to define the `$(kw)=` keyword and has hence no effect, if you set `$(kw)` directly." + end, +) + +define!( + :Problem, + :Constrained, + (; M="M", p="p") -> """ + ```math +\\begin{aligned} +\\min_{$p ∈ $(_tex(:Cal, M))} & f($p)\\\\ +$(_tex(:text, "subject to")) &g_i($p) ≤ 0 \\quad $(_tex(:text, " for ")) i= 1, …, m,\\\\ +\\quad & h_j($p)=0 \\quad $(_tex(:text, " for ")) j=1,…,n, +\\end{aligned} +``` +""", +) +define!( + :Problem, + :Default, + (; M="M", p="p") -> "\n```math\n$(_tex(:argmin))_{$p ∈ $(_tex(:Cal, M))} f($p)\n```\n", +) +_problem(args...; kwargs...) = glossary(:Problem, args...; kwargs...) +# +# +# Stopping Criteria +define!(:StoppingCriterion, :Any, "[` | `](@ref StopWhenAny)") +define!(:StoppingCriterion, :All, "[` & `](@ref StopWhenAll)") +_sc(args...; kwargs...) = glossary(:StoppingCriterion, args...; kwargs...) + +# +# +# --- +# Variables +# in fields, keyword arguments, parameters +# for each variable as a symbol, we store +# The variable name should be the symbol +# :default – in positional or keyword arguments +# :description – a text description of the variable +# :type a type +_var(args...; kwargs...) = glossary(:Variable, args...; kwargs...) + +#Meta: How to format an argument, a field of a struct, and a keyword +define!( + :Variable, + :Argument, + # for the symbol s (possibly with special type t) + # d is the symbol to display, for example for Argument p that in some signatures is called q + # t is its type (if different from the default _var(s, :type)) + # type= determines whether to display the default or passed type + # add=[] adds more information from _var(s, add[i]; kwargs...) sub fields + function (s::Symbol, d="$s", t=""; type=false, add="", kwargs...) + # create type to display + disp_type = type ? "::$(length(t) > 0 ? t : _var(s, :type))" : "" + addv = !isa(add, Vector) ? [add] : add + disp_add = join([a isa Symbol ? _var(s, a; kwargs...) : "$a" for a in addv]) + return "* `$(d)$(disp_type)`: $(_var(s, :description;kwargs...))$(disp_add)" + end, +) +define!( + :Variable, + :Field, + function (s::Symbol, d="$s", t=""; type=true, add="", kwargs...) + disp_type = type ? "::$(length(t) > 0 ? t : _var(s, :type))" : "" + addv = !isa(add, Vector) ? [add] : add + disp_add = join([a isa Symbol ? _var(s, a; kwargs...) : "$a" for a in addv]) + return "* `$(d)$(disp_type)`: $(_var(s, :description; kwargs...))$(disp_add)" + end, +) +define!( + :Variable, + :Keyword, + function ( + s::Symbol, + display="$s", + t=""; + default="", + add="", + type=false, + description::Bool=true, + kwargs..., + ) + addv = !isa(add, Vector) ? [add] : add + disp_add = join([a isa Symbol ? _var(s, a; kwargs...) : "$a" for a in addv]) + return "* `$(display)$(type ? "::$(length(t) > 0 ? t : _var(s, :type))" : "")=`$(length(default) > 0 ? default : _var(s, :default; kwargs...))$(description ? ": $(_var(s, :description; kwargs...))" : "")$(disp_add)" + end, +) +# +# Actual variables + +define!( + :Variable, + :at_iteration, + :description, + "an integer indicating at which the stopping criterion last indicted to stop, which might also be before the solver started (`0`). Any negative value indicates that this was not yet the case;", +) +define!(:Variable, :at_iteration, :type, "Int") + +define!( + :Variable, + :evaluation, + :description, + "specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result ([`AllocatingEvaluation`](@ref)) or whether they modify their input argument to return the result therein ([`InplaceEvaluation`](@ref)). Since usually the first argument is the manifold, the modified argument is the second.", +) +define!(:Variable, :evaluation, :type, "AbstractEvaluationType") +define!(:Variable, :evaluation, :default, "[`AllocatingEvaluation`](@ref)`()`") +define!( + :Variable, + :evaluation, + :GradientExample, + "For example `grad_f(M,p)` allocates, but `grad_f!(M, X, p)` computes the result in-place of `X`.", +) + +define!( + :Variable, + :f, + :description, + function (; M="M", p="p") + return "a cost function ``f: $(_tex(:Cal, M))→ ℝ`` implemented as `($M, $p) -> v`" + end, +) + +define!( + :Variable, + :grad_f, + :description, + (; M="M", p="p") -> + "the (Riemannian) gradient ``$(_tex(:grad))f``: $(_math(:M, M=M)) → $(_math(:TpM; M=M, p=p)) of f as a function `(M, p) -> X` or a function `(M, X, p) -> X` computing `X` in-place", +) + +define!( + :Variable, + :Hess_f, + :description, + (; M="M", p="p") -> + "the (Riemannian) Hessian ``$(_tex(:Hess))f``: $(_math(:TpM, M=M, p=p)) → $(_math(:TpM; M=M, p=p)) of f as a function `(M, p, X) -> Y` or a function `(M, Y, p, X) -> Y` computing `Y` in-place", +) + +define!( + :Variable, + :inverse_retraction_method, + :description, + "an inverse retraction ``$(_tex(:invretr))`` to use, see [the section on retractions and their inverses](@extref ManifoldsBase :doc:`retractions`)", +) +define!(:Variable, :inverse_retraction_method, :type, "AbstractInverseRetractionMethod") +define!( + :Variable, + :inverse_retraction_method, + :default, + (; M="M", p="p") -> + "[`default_inverse_retraction_method`](@extref `ManifoldsBase.default_inverse_retraction_method-Tuple{AbstractManifold}`)`($M, typeof($p))`", +) + +define!( + :Variable, :M, :description, (; M="M") -> "a Riemannian manifold ``$(_tex(:Cal, M))``" +) +define!(:Variable, :M, :type, "`$(_link(:AbstractManifold))` ") + +define!( + :Variable, :p, :description, (; M="M") -> "a point on the manifold ``$(_tex(:Cal, M))``" +) +define!(:Variable, :p, :type, "P") +define!(:Variable, :p, :default, (; M="M") -> _link(:rand; M=M)) +define!(:Variable, :p, :as_Iterate, "storing the current iterate") +define!(:Variable, :p, :as_Initial, "to specify the initial value") + +define!( + :Variable, + :retraction_method, + :description, + "a retraction ``$(_tex(:retr))`` to use, see [the section on retractions](@extref ManifoldsBase :doc:`retractions`)", +) +define!(:Variable, :retraction_method, :type, "AbstractRetractionMethod") +define!( + :Variable, + :retraction_method, + :default, + "[`default_retraction_method`](@extref `ManifoldsBase.default_retraction_method-Tuple{AbstractManifold}`)`(M, typeof(p))`", +) + +define!( + :Variable, + :stepsize, + :description, + (; M="M") -> "a functor inheriting from [`Stepsize`](@ref) to determine a step size", +) +define!(:Variable, :stepsize, :type, "Stepsize") + +define!( + :Variable, + :stopping_criterion, + :description, + (; M="M") -> "a functor indicating that the stopping criterion is fulfilled", +) +define!(:Variable, :stopping_criterion, :type, "StoppingCriterion") + +define!( + :Variable, + :sub_problem, + :description, + (; M="M") -> + "specify a problem for a solver or a closed form solution function, which can be allocating or in-place.", +) +define!(:Variable, :sub_problem, :type, "Union{AbstractManoptProblem, F}") + +define!( + :Variable, + :sub_kwargs, + :description, + "a named tuple of keyword arguments that are passed to [`decorate_objective!`](@ref) of the sub solvers objective, the [`decorate_state!`](@ref) of the subsovlers state, and the sub state constructor itself.", +) +define!(:Variable, :sub_kwargs, :default, "`(;)`") + +define!( + :Variable, + :sub_problem, + :description, + (; M="M") -> + " specify a problem for a solver or a closed form solution function, which can be allocating or in-place.", +) +define!(:Variable, :sub_problem, :type, "Union{AbstractManoptProblem, F}") + +define!( + :Variable, + :sub_state, + :description, + (; M="M") -> + " a state to specify the sub solver to use. For a closed form solution, this indicates the type of function.", +) +define!(:Variable, :sub_state, :type, "Union{AbstractManoptProblem, F}") + +define!(:Variable, :subgrad_f, :symbol, "∂f") +define!( + :Variable, + :subgrad_f, + :description, + (; M="M", p="p") -> """ +the subgradient ``∂f: $(_math(:M; M=M)) → $(_math(:TM; M=M))`` of f as a function `(M, p) -> X` +or a function `(M, X, p) -> X` computing `X` in-place. +This function should always only return one element from the subgradient. +""", +) +define!( + :Variable, + :subgrad_f, + :description, + (; M="M") -> + " a state to specify the sub solver to use. For a closed form solution, this indicates the type of function.", +) + +define!( + :Variable, + :vector_transport_method, + :description, + (; M="M", p="p") -> + "a vector transport ``$(_math(:vector_transport, :symbol))`` to use, see [the section on vector transports](@extref ManifoldsBase :doc:`vector_transports`)", +) +define!(:Variable, :vector_transport_method, :type, "AbstractVectorTransportMethodP") +define!( + :Variable, + :vector_transport_method, + :default, + (; M="M", p="p") -> + "[`default_vector_transport_method`](@extref `ManifoldsBase.default_vector_transport_method-Tuple{AbstractManifold}`)`($M, typeof($p))`", +) + +define!( + :Variable, + :X, + :description, + (; M="M", p="p") -> + "a tangent vector at the point ``$p`` on the manifold ``$(_tex(:Cal, M))``", +) +define!(:Variable, :X, :type, "T") +define!(:Variable, :X, :default, (; M="M", p="p") -> _link(:zero_vector; M=M, p=p)) +define!(:Variable, :X, :as_Gradient, "storing the gradient at the current iterate") +define!(:Variable, :X, :as_Subgradient, "storing a subgradient at the current iterate") +define!(:Variable, :X, :as_Memory, "to specify the representation of a tangent vector") diff --git a/src/helpers/LineSearchesTypes.jl b/src/helpers/LineSearchesTypes.jl index bf803f245c..8d429bb593 100644 --- a/src/helpers/LineSearchesTypes.jl +++ b/src/helpers/LineSearchesTypes.jl @@ -19,11 +19,8 @@ The initial step selection from Linesearches.jl is not yet supported and the val # Keyword Arguments -* $_kw_retraction_method_default: - $_kw_retraction_method -* $_kw_vector_transport_method_default: - $_kw_vector_transport_method - +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :vector_transport_method)) """ struct LineSearchesStepsize{ TLS,TRM<:AbstractRetractionMethod,TVTM<:AbstractVectorTransportMethod diff --git a/src/helpers/checks.jl b/src/helpers/checks.jl index b2309d5950..d771dee7f1 100644 --- a/src/helpers/checks.jl +++ b/src/helpers/checks.jl @@ -21,8 +21,7 @@ no plot is generated, * `name="differential"`: name to display in the plot * `plot=false`: whether to plot the result (if `Plots.jl` is loaded). The plot is in log-log-scale. This is returned and can then also be saved. -* $_kw_retraction_method_default: - $_kw_retraction_method +$(_var(:Keyword, :retraction_method)) * `slope_tol=0.1`: tolerance for the slope (global) of the approximation * `throw_error=false`: throw an error message if the differential is wrong * `window=nothing`: specify window sizes within the `log_range` that are used for @@ -82,7 +81,7 @@ Verify numerically whether the gradient `grad_f(M,p)` of `f(M,p)` is correct, th $_doc_check_gradient_formula or in other words, that the error between the function ``f`` and its first order Taylor -behaves in error ``$_l_cO O(t^2)``, which indicates that the gradient is correct, +behaves in error ``$(_tex(:Cal, "O"))(t^2)``, which indicates that the gradient is correct, cf. also [Boumal:2023; Section 4.8](@cite). Note that if the errors are below the given tolerance and the method is exact, @@ -91,7 +90,7 @@ no plot is generated. # Keyword arguments * `check_vector=true`: - verify that ``$_l_grad f(p) ∈ $(_l_TpM())`` using `is_vector`. + verify that ``$(_tex(:grad))f(p) ∈ $(_math(:TpM))`` using `is_vector`. * `exactness_tol=1e-12`: if all errors are below this tolerance, the gradient is considered to be exact * `io=nothing`: @@ -107,8 +106,7 @@ no plot is generated. * `plot=false`: whether to plot the result (if `Plots.jl` is loaded). The plot is in log-log-scale. This is returned and can then also be saved. -* $_kw_retraction_method_default: - $_kw_retraction_method +$(_var(:Keyword, :retraction_method)) * `slope_tol=0.1`: tolerance for the slope (global) of the approximation * `atol`=:none`: @@ -161,7 +159,7 @@ The approximation is then $_doc_check_Hess_formula or in other words, that the error between the function ``f`` and its second order Taylor -behaves in error ``$_l_cO (t^3)``, which indicates that the Hessian is correct, +behaves in error ``$(_tex(:Cal, "O"))(t^3)``, which indicates that the Hessian is correct, cf. also [Boumal:2023; Section 6.8](@cite). Note that if the errors are below the given tolerance and the method is exact, @@ -170,13 +168,13 @@ no plot is generated. # Keyword arguments * `check_grad=true`: - verify that ``$_l_grad f(p) ∈ $(_l_TpM())``. + verify that ``$(_tex(:grad))f(p) ∈ $(_math(:TpM))``. * `check_linearity=true`: verify that the Hessian is linear, see [`is_Hessian_linear`](@ref) using `a`, `b`, `X`, and `Y` * `check_symmetry=true`: verify that the Hessian is symmetric, see [`is_Hessian_symmetric`](@ref) * `check_vector=false`: - verify that `$_l_Hess f(p)[X] ∈ $(_l_TpM())`` using `is_vector`. + verify that `$(_tex(:Hess)) f(p)[X] ∈ $(_math(:TpM))`` using `is_vector`. * `mode=:Default`: specify the mode for the verification; the default assumption is, that the retraction provided is of second order. Otherwise one can also verify the Hessian @@ -194,7 +192,7 @@ no plot is generated. * `gradient=grad_f(M, p)`: instead of the gradient function you can also provide the gradient at `p` directly * `Hessian=Hess_f(M, p, X)`: - instead of the Hessian function you can provide the result of ``$_l_Hess f(p)[X]`` directly. + instead of the Hessian function you can provide the result of ``$(_tex(:Hess)) f(p)[X]`` directly. Note that evaluations of the Hessian might still be necessary for checking linearity and symmetry and/or when using `:CriticalPoint` mode. * `limits=(1e-8,1)`: specify the limits in the `log_range` @@ -204,8 +202,7 @@ no plot is generated. number of points to use within the `log_range` default range ``[10^{-8},10^{0}]`` * `plot=false`: whether to plot the resulting verification (requires `Plots.jl` to be loaded). The plot is in log-log-scale. This is returned and can then also be saved. -* $_kw_retraction_method_default: - $_kw_retraction_method +$(_var(:Keyword, :retraction_method)) * `slope_tol=0.1`: tolerance for the slope (global) of the approximation * `error=:none`: diff --git a/src/plans/Douglas_Rachford_plan.jl b/src/plans/Douglas_Rachford_plan.jl index 89eb927d9c..a2a858dfcb 100644 --- a/src/plans/Douglas_Rachford_plan.jl +++ b/src/plans/Douglas_Rachford_plan.jl @@ -22,16 +22,17 @@ reflect(M::AbstractManifold, pr::Function, x; kwargs...) Reflect the point `x` from the manifold `M` at point `p`, given by -$_math_reflect +```math +$(_tex(:reflect))_p(q) = $(_tex(:retr))_p(-$(_tex(:invretr))_p q), +``` +where ``$(_tex(:retr))`` and ``$(_tex(:invretr))`` denote a retraction and an inverse retraction, respectively. This can also be done in place of `q`. ## Keyword arguments -* $_kw_retraction_method_default: - $_kw_retraction_method -* $_kw_inverse_retraction_method_default: - $_kw_inverse_retraction_method +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :inverse_retraction_method)) and for the `reflect!` additionally diff --git a/src/plans/adabtive_regularization_with_cubics_plan.jl b/src/plans/adabtive_regularization_with_cubics_plan.jl index f682d96f6b..e5b297a44c 100644 --- a/src/plans/adabtive_regularization_with_cubics_plan.jl +++ b/src/plans/adabtive_regularization_with_cubics_plan.jl @@ -36,7 +36,7 @@ function AdaptiveRagularizationWithCubicsModelObjective( } return AdaptiveRagularizationWithCubicsModelObjective{E,O,R}(mho, σ) end -function set_manopt_parameter!( +function set_parameter!( f::AdaptiveRagularizationWithCubicsModelObjective, ::Union{Val{:σ},Val{:RegularizationParameter}}, σ, diff --git a/src/plans/augmented_lagrangian_plan.jl b/src/plans/augmented_lagrangian_plan.jl index 8f696b70db..477585ed8d 100644 --- a/src/plans/augmented_lagrangian_plan.jl +++ b/src/plans/augmented_lagrangian_plan.jl @@ -1,5 +1,4 @@ -#_doc_al_Cost() = "$(_l_cal("L"))_\\rho(p, μ, λ)" -_doc_al_Cost(iter) = "$(_l_cal("L"))_{ρ^{($iter)}}(p, μ^{($iter)}, λ^{($iter)})" +_doc_AL_Cost(iter) = "$(_tex(:Cal, "L"))_{ρ^{($iter)}}(p, μ^{($iter)}, λ^{($iter)})" _doc_AL_Cost_long = raw""" ```math \mathcal L_\rho(p, μ, λ) @@ -37,11 +36,11 @@ mutable struct AugmentedLagrangianCost{CO,R,T} <: AbstractConstrainedFunctor{T} μ::T λ::T end -function set_manopt_parameter!(alc::AugmentedLagrangianCost, ::Val{:ρ}, ρ) +function set_parameter!(alc::AugmentedLagrangianCost, ::Val{:ρ}, ρ) alc.ρ = ρ return alc end -get_manopt_parameter(alc::AugmentedLagrangianCost, ::Val{:ρ}) = alc.ρ +get_parameter(alc::AugmentedLagrangianCost, ::Val{:ρ}) = alc.ρ function (L::AugmentedLagrangianCost)(M::AbstractManifold, p) gp = get_inequality_constraint(M, L.co, p, :) @@ -69,7 +68,7 @@ additionally this gradient does accept a positional last argument to specify the for the internal gradient call of the constrained objective. based on the internal [`ConstrainedManifoldObjective`](@ref) and computes the gradient -`$_l_grad $(_l_cal("L"))_{ρ}(p, μ, λ)``, see also [`AugmentedLagrangianCost`](@ref). +`$(_tex(:grad))$(_tex(:Cal, "L"))_{ρ}(p, μ, λ)``, see also [`AugmentedLagrangianCost`](@ref). ## Fields @@ -91,11 +90,11 @@ function (LG::AugmentedLagrangianGrad)(M::AbstractManifold, p) X = zero_vector(M, p) return LG(M, X, p) end -function set_manopt_parameter!(alg::AugmentedLagrangianGrad, ::Val{:ρ}, ρ) +function set_parameter!(alg::AugmentedLagrangianGrad, ::Val{:ρ}, ρ) alg.ρ = ρ return alg end -get_manopt_parameter(alg::AugmentedLagrangianGrad, ::Val{:ρ}) = alg.ρ +get_parameter(alg::AugmentedLagrangianGrad, ::Val{:ρ}) = alg.ρ # default, that is especially when the `grad_g` and `grad_h` are functions. function (LG::AugmentedLagrangianGrad)( M::AbstractManifold, X, p, range=NestedPowerRepresentation() diff --git a/src/plans/cache.jl b/src/plans/cache.jl index 0b58720637..098c0c2e8f 100644 --- a/src/plans/cache.jl +++ b/src/plans/cache.jl @@ -5,7 +5,7 @@ SimpleManifoldCachedObjective{O<:AbstractManifoldGradientObjective{E,TC,TG}, P, T,C} <: AbstractManifoldGradientObjective{E,TC,TG} Provide a simple cache for an [`AbstractManifoldGradientObjective`](@ref) that is for a given point `p` this cache -stores a point `p` and a gradient ``$(_l_grad) f(p)`` in `X` as well as a cost value ``f(p)`` in `c`. +stores a point `p` and a gradient ``$(_tex(:grad)) f(p)`` in `X` as well as a cost value ``f(p)`` in `c`. Both `X` and `c` are accompanied by booleans to keep track of their validity. @@ -15,7 +15,7 @@ Both `X` and `c` are accompanied by booleans to keep track of their validity. ## Keyword arguments -* `p=`$(_link_rand()): a point on the manifold to initialize the cache with +* `p=`$(Manopt._link(:rand)): a point on the manifold to initialize the cache with * `X=get_gradient(M, obj, p)` or `zero_vector(M,p)`: a tangent vector to store the gradient in, see also `initialize=` * `c=[`get_cost`](@ref)`(M, obj, p)` or `0.0`: a value to store the cost function in `initialize` diff --git a/src/plans/conjugate_gradient_plan.jl b/src/plans/conjugate_gradient_plan.jl index d8328a69f0..859d3a87f4 100644 --- a/src/plans/conjugate_gradient_plan.jl +++ b/src/plans/conjugate_gradient_plan.jl @@ -27,19 +27,19 @@ specify options for a conjugate gradient descent algorithm, that solves a # Fields -* $_field_p -* $_field_X +$(_var(:Field, :p; add=[:as_Iterate])) +$(_var(:Field, :X)) * `δ`: the current descent direction, also a tangent vector * `β`: the current update coefficient rule, see . * `coefficient`: function to determine the new `β` -* $_field_step -* $_field_stop -* $_field_retr -* $_field_vector_transp +$(_var(:Field, :stepsize)) +$(_var(:Field, :stopping_criterion, "stop")) +$(_var(:Field, :retraction_method)) +$(_var(:Field, :vector_transport_method)) # Constructor - ConjugateGradientState(M, p) + ConjugateGradientState(M::AbstractManifold; kwargs...) where the last five fields can be set by their names as keyword and the `X` can be set to a tangent vector type using the keyword `initial_gradient` which defaults to `zero_vector(M,p)`, @@ -49,11 +49,13 @@ and `δ` is initialized to a copy of this vector. The following fields from above β_k` to compute the conjugate gradient update coefficient adapted to manifolds See also [`conjugate_gradient_descent`](@ref) # Constructor - ConjugateDescentCoefficient(a::StoreStateAction=()) + + ConjugateDescentCoefficientRule() Construct the conjugate descent coefficient update rule, a new storage is created by default. + +# See also + +[`ConjugateDescentCoefficient`](@ref), [`conjugate_gradient_descent`](@ref) +""" +struct ConjugateDescentCoefficientRule <: DirectionUpdateRule end + """ -struct ConjugateDescentCoefficient <: DirectionUpdateRule end + ConjugateDescentCoefficient() + ConjugateDescentCoefficient(M::AbstractManifold) -update_rule_storage_points(::ConjugateDescentCoefficient) = Tuple{:Iterate} -update_rule_storage_vectors(::ConjugateDescentCoefficient) = Tuple{:Gradient} +Compute the (classical) conjugate gradient coefficient based on [Fletcher:1987](@cite) adapted to manifolds -function (u::DirectionUpdateRuleStorage{ConjugateDescentCoefficient})( +$(_doc_CG_notaion) + +Then the coefficient reads +```math +β_k = $(_tex(:frac, _tex(:norm, "X_{k+1}"; index="p_{k+1}")*"^2", "⟨-δ_k,X_k⟩_{p_k}")) +``` + +$(_note(:ManifoldDefaultFactory, "ConjugateDescentCoefficientRule")) +""" +function ConjugateDescentCoefficient() + return ManifoldDefaultsFactory( + Manopt.ConjugateDescentCoefficientRule; requires_manifold=false + ) +end + +update_rule_storage_points(::ConjugateDescentCoefficientRule) = Tuple{:Iterate} +update_rule_storage_vectors(::ConjugateDescentCoefficientRule) = Tuple{:Gradient} + +function (u::DirectionUpdateRuleStorage{ConjugateDescentCoefficientRule})( amp::AbstractManoptProblem, cgs::ConjugateGradientDescentState, i ) M = get_manifold(amp) @@ -195,52 +202,46 @@ function (u::DirectionUpdateRuleStorage{ConjugateDescentCoefficient})( update_storage!(u.storage, amp, cgs) return coef end -show(io::IO, ::ConjugateDescentCoefficient) = print(io, "ConjugateDescentCoefficient()") +function show(io::IO, ::ConjugateDescentCoefficientRule) + return print(io, "Manopt.ConjugateDescentCoefficientRule()") +end -@doc raw""" - DaiYuanCoefficient <: DirectionUpdateRule +@doc """ + DaiYuanCoefficientRule <: DirectionUpdateRule -Computes an update coefficient for the conjugate gradient method, where -the [`ConjugateGradientDescentState`](@ref)` cgds` include the last iterates -``p_k,X_k``, the current iterates ``p_{k+1},X_{k+1}`` of the iterate and the gradient, respectively, -and the last update direction ``\delta=\delta_k``, based on [DaiYuan:1999](@cite) adapted to manifolds: +A functor `(problem, state, k) -> β_k` to compute the conjugate gradient update coefficient based on [DaiYuan:1999](@cite) adapted to manifolds -Let ``\nu_k = X_{k+1} - P_{p_{k+1}\gets p_k}X_k``, -where ``P_{a\gets b}(⋅)`` denotes a vector transport from the tangent space at ``a`` to ``b``. +# Fields -Then the coefficient reads +$(_var(:Field, :vector_transport_method)) -````math -β_k = -\frac{ \lVert X_{k+1} \rVert_{p_{k+1}}^2 } -{\langle P_{p_{k+1}\gets p_k}\delta_k, \nu_k \rangle_{p_{k+1}}}. -```` +# Constructor -See also [`conjugate_gradient_descent`](@ref) + DaiYuanCoefficientRule(M::AbstractManifold; kwargs...) -# Constructor - function DaiYuanCoefficient( - M::AbstractManifold=DefaultManifold(2); - t::AbstractVectorTransportMethod=default_vector_transport_method(M) - ) +Construct the Dai—Yuan coefficient update rule. + +# Keyword arguments + +$(_var(:Keyword, :vector_transport_method)) -Construct the Dai—Yuan coefficient update rule, where the parallel transport is the -default vector transport and a new storage is created by default. +# See also + +[`DaiYuanCoefficient`](@ref), [`conjugate_gradient_descent`](@ref) """ -struct DaiYuanCoefficient{TVTM<:AbstractVectorTransportMethod} <: DirectionUpdateRule - transport_method::TVTM - function DaiYuanCoefficient(t::AbstractVectorTransportMethod) - return new{typeof(t)}(t) - end +struct DaiYuanCoefficientRule{VTM<:AbstractVectorTransportMethod} <: DirectionUpdateRule + vector_transport_method::VTM end -function DaiYuanCoefficient(M::AbstractManifold=DefaultManifold(2)) - return DaiYuanCoefficient(default_vector_transport_method(M)) +function DaiYuanCoefficientRule( + M::AbstractManifold; vector_transport_method::VTM=default_vector_transport_method(M) +) where {VTM<:AbstractVectorTransportMethod} + return DaiYuanCoefficientRule{VTM}(vector_transport_method) end -update_rule_storage_points(::DaiYuanCoefficient) = Tuple{:Iterate} -update_rule_storage_vectors(::DaiYuanCoefficient) = Tuple{:Gradient,:δ} +update_rule_storage_points(::DaiYuanCoefficientRule) = Tuple{:Iterate} +update_rule_storage_vectors(::DaiYuanCoefficientRule) = Tuple{:Gradient,:δ} -function (u::DirectionUpdateRuleStorage{<:DaiYuanCoefficient})( +function (u::DirectionUpdateRuleStorage{<:DaiYuanCoefficientRule})( amp::AbstractManoptProblem, cgs::ConjugateGradientDescentState, i ) M = get_manifold(amp) @@ -254,43 +255,73 @@ function (u::DirectionUpdateRuleStorage{<:DaiYuanCoefficient})( X_old = get_storage(u.storage, VectorStorageKey(:Gradient)) δ_old = get_storage(u.storage, VectorStorageKey(:δ)) - gradienttr = vector_transport_to(M, p_old, X_old, cgs.p, u.coefficient.transport_method) + gradienttr = vector_transport_to( + M, p_old, X_old, cgs.p, u.coefficient.vector_transport_method + ) ν = cgs.X - gradienttr #notation y from [HZ06] - δtr = vector_transport_to(M, p_old, δ_old, cgs.p, u.coefficient.transport_method) + δtr = vector_transport_to(M, p_old, δ_old, cgs.p, u.coefficient.vector_transport_method) coef = inner(M, cgs.p, cgs.X, cgs.X) / inner(M, p_old, δtr, ν) update_storage!(u.storage, amp, cgs) return coef end -function show(io::IO, u::DaiYuanCoefficient) - return print(io, "DaiYuanCoefficient($(u.transport_method))") +function show(io::IO, u::DaiYuanCoefficientRule) + return print( + io, + "Manopt.DaiYuanCoefficientRule(; vector_transport_method=$(u.vector_transport_method))", + ) end -@doc raw""" - FletcherReevesCoefficient <: DirectionUpdateRule +@doc """ + DaiYuanCoefficient(; kwargs...) + DaiYuanCoefficient(M::AbstractManifold; kwargs...) -Computes an update coefficient for the conjugate gradient method, where -the [`ConjugateGradientDescentState`](@ref)` cgds` include the last iterates -``p_k,X_k``, the current iterates ``p_{k+1},X_{k+1}`` of the iterate and the gradient, respectively, -and the last update direction ``\delta=\delta_k``, based on [FletcherReeves:1964](@cite) adapted to manifolds: +Computes an update coefficient for the [`conjugate_gradient_descent`](@ref) algorithm based on [DaiYuan:1999](@cite) adapted to +Riemannian manifolds. +$(_doc_CG_notaion) +Let ``ν_k = X_{k+1} - $(_math(:vector_transport, :symbol, "p_{k+1}", "p_k"))X_k``, +where ``$(_math(:vector_transport, :symbol))`` denotes a vector transport. + +Then the coefficient reads ````math β_k = -\frac{\lVert X_{k+1}\rVert_{p_{k+1}}^2}{\lVert X_k\rVert_{x_{k}}^2}. +$(_tex( + :frac, + _tex(:norm, "X_{k+1}"; index="p_{k+1}")*"^2", + "⟨$(_math(:vector_transport, :symbol, "p_{k+1}", "p_k"))δ_k, ν_k⟩_{p_{k+1}}" +)) ```` -See also [`conjugate_gradient_descent`](@ref) +# Keyword arguments + +$(_var(:Keyword, :vector_transport_method)) + +$(_note(:ManifoldDefaultFactory, "DaiYuanCoefficientRule")) +""" +function DaiYuanCoefficient(args...; kwargs...) + return ManifoldDefaultsFactory(Manopt.DaiYuanCoefficientRule, args...; kwargs...) +end + +@doc """ + FletcherReevesCoefficientRule <: DirectionUpdateRule + +A functor `(problem, state, k) -> β_k` to compute the conjugate gradient update coefficient based on [FletcherReeves:1964](@cite) adapted to manifolds # Constructor - FletcherReevesCoefficient(a::StoreStateAction=()) -Construct the Fletcher—Reeves coefficient update rule, a new storage is created by default. + FletcherReevesCoefficientRule() + +Construct the Fletcher—Reeves coefficient update rule. + +# See also +[`FletcherReevesCoefficient`](@ref), [`conjugate_gradient_descent`](@ref) """ -struct FletcherReevesCoefficient <: DirectionUpdateRule end +struct FletcherReevesCoefficientRule <: DirectionUpdateRule end -update_rule_storage_points(::FletcherReevesCoefficient) = Tuple{:Iterate} -update_rule_storage_vectors(::FletcherReevesCoefficient) = Tuple{:Gradient} +update_rule_storage_points(::FletcherReevesCoefficientRule) = Tuple{:Iterate} +update_rule_storage_vectors(::FletcherReevesCoefficientRule) = Tuple{:Gradient} -function (u::DirectionUpdateRuleStorage{FletcherReevesCoefficient})( +function (u::DirectionUpdateRuleStorage{FletcherReevesCoefficientRule})( amp::AbstractManoptProblem, cgs::ConjugateGradientDescentState, i ) M = get_manifold(amp) @@ -304,55 +335,73 @@ function (u::DirectionUpdateRuleStorage{FletcherReevesCoefficient})( update_storage!(u.storage, amp, cgs) return coef end -function show(io::IO, ::FletcherReevesCoefficient) - return print(io, "FletcherReevesCoefficient()") +function show(io::IO, ::FletcherReevesCoefficientRule) + return print(io, "Manopt.FletcherReevesCoefficientRule()") end -@doc raw""" - HagerZhangCoefficient <: DirectionUpdateRule +@doc """ + FletcherReevesCoefficient() + FletcherReevesCoefficient(M::AbstractManifold) + +Computes an update coefficient for the [`conjugate_gradient_descent`](@ref) algorithm based on [FletcherReeves:1964](@cite) adapted to manifolds -Computes an update coefficient for the conjugate gradient method, where -the [`ConjugateGradientDescentState`](@ref)` cgds` include the last iterates -``p_k,X_k``, the current iterates ``p_{k+1},X_{k+1}`` of the iterate and the gradient, respectively, -and the last update direction ``\delta=\delta_k``, based on [HagerZhang:2005](@cite). -adapted to manifolds: let ``\nu_k = X_{k+1} - P_{p_{k+1}\gets p_k}X_k``, -where ``P_{a\gets b}(⋅)`` denotes a vector transport from the tangent space at ``a`` to ``b``. +$(_doc_CG_notaion) +Then the coefficient reads ````math -β_k = \Bigl\langle\nu_k - -\frac{ 2\lVert \nu_k\rVert_{p_{k+1}}^2 }{ \langle P_{p_{k+1}\gets p_k}\delta_k, \nu_k \rangle_{p_{k+1}} } -P_{p_{k+1}\gets p_k}\delta_k, -\frac{X_{k+1}}{ \langle P_{p_{k+1}\gets p_k}\delta_k, \nu_k \rangle_{p_{k+1}} } -\Bigr\rangle_{p_{k+1}}. +β_k = +$(_tex( + :frac, + _tex(:norm, "X_{k+1}"; index="p_{k+1}")*"^2", + _tex(:norm, "X_k"; index="p_{k}")*"^2" +)). ```` -This method includes a numerical stability proposed by those authors. +$(_note(:ManifoldDefaultFactory, "FletcherReevesCoefficientRule")) +""" +function FletcherReevesCoefficient() + return ManifoldDefaultsFactory( + Manopt.FletcherReevesCoefficientRule; requires_manifold=false + ) +end -See also [`conjugate_gradient_descent`](@ref) +@doc """ + HagerZhangCoefficientRule <: DirectionUpdateRule + +A functor `(problem, state, k) -> β_k` to compute the conjugate gradient update coefficient based on [HagerZhang:2005](@cite) adapted to manifolds + +# Fields + +$(_var(:Field, :vector_transport_method)) # Constructor - function HagerZhangCoefficient(t::AbstractVectorTransportMethod) - function HagerZhangCoefficient(M::AbstractManifold = DefaultManifold(2)) -Construct the Hager Zhang coefficient update rule, where the parallel transport is the -default vector transport and a new storage is created by default. + HagerZhangCoefficientRule(M::AbstractManifold; kwargs...) + +Construct the Hager-Zang coefficient update rule based on [HagerZhang:2005](@cite) adapted to manifolds. + +# Keyword arguments + +$(_var(:Keyword, :vector_transport_method)) + +# See also + +[`HagerZhangCoefficient`](@ref), [`conjugate_gradient_descent`](@ref) """ -mutable struct HagerZhangCoefficient{TVTM<:AbstractVectorTransportMethod} <: +mutable struct HagerZhangCoefficientRule{VTM<:AbstractVectorTransportMethod} <: DirectionUpdateRule - transport_method::TVTM - - function HagerZhangCoefficient(t::AbstractVectorTransportMethod) - return new{typeof(t)}(t) - end + vector_transport_method::VTM end -function HagerZhangCoefficient(M::AbstractManifold=DefaultManifold(2)) - return HagerZhangCoefficient(default_vector_transport_method(M)) +function HagerZhangCoefficientRule( + M::AbstractManifold; vector_transport_method::VTM=default_vector_transport_method(M) +) where {VTM<:AbstractVectorTransportMethod} + return HagerZhangCoefficientRule{VTM}(vector_transport_method) end -update_rule_storage_points(::HagerZhangCoefficient) = Tuple{:Iterate} -update_rule_storage_vectors(::HagerZhangCoefficient) = Tuple{:Gradient,:δ} +update_rule_storage_points(::HagerZhangCoefficientRule) = Tuple{:Iterate} +update_rule_storage_vectors(::HagerZhangCoefficientRule) = Tuple{:Gradient,:δ} -function (u::DirectionUpdateRuleStorage{<:HagerZhangCoefficient})( +function (u::DirectionUpdateRuleStorage{<:HagerZhangCoefficientRule})( amp::AbstractManoptProblem, cgs::ConjugateGradientDescentState, i ) M = get_manifold(amp) @@ -366,9 +415,11 @@ function (u::DirectionUpdateRuleStorage{<:HagerZhangCoefficient})( X_old = get_storage(u.storage, VectorStorageKey(:Gradient)) δ_old = get_storage(u.storage, VectorStorageKey(:δ)) - gradienttr = vector_transport_to(M, p_old, X_old, cgs.p, u.coefficient.transport_method) + gradienttr = vector_transport_to( + M, p_old, X_old, cgs.p, u.coefficient.vector_transport_method + ) ν = cgs.X - gradienttr #notation y from [HZ06] - δtr = vector_transport_to(M, p_old, δ_old, cgs.p, u.coefficient.transport_method) + δtr = vector_transport_to(M, p_old, δ_old, cgs.p, u.coefficient.vector_transport_method) denom = inner(M, cgs.p, δtr, ν) νknormsq = inner(M, cgs.p, ν, ν) β = @@ -381,53 +432,84 @@ function (u::DirectionUpdateRuleStorage{<:HagerZhangCoefficient})( update_storage!(u.storage, amp, cgs) return coef end -function show(io::IO, u::HagerZhangCoefficient) - return print(io, "HagerZhangCoefficient($(u.transport_method))") +function show(io::IO, u::HagerZhangCoefficientRule) + return print( + io, + "Manopt.HagerZhangCoefficientRule(; vector_transport_method=$(u.vector_transport_method))", + ) end -@doc raw""" - HestenesStiefelCoefficient <: DirectionUpdateRule +@doc """ + HagerZhangCoefficient(; kwargs...) + HagerZhangCoefficient(M::AbstractManifold; kwargs...) -Computes an update coefficient for the conjugate gradient method, where -the [`ConjugateGradientDescentState`](@ref)` cgds` include the last iterates -``p_k,X_k``, the current iterates ``p_{k+1},X_{k+1}`` of the iterate and the gradient, respectively, -and the last update direction ``\delta=\delta_k``, based on [HestenesStiefel:1952](@cite) -adapted to manifolds as follows: +Computes an update coefficient for the [`conjugate_gradient_descent`](@ref) algorithm based on [FletcherReeves:1964](@cite) adapted to manifolds -Let ``\nu_k = X_{k+1} - P_{p_{k+1}\gets p_k}X_k``. -Then the update reads +$(_doc_CG_notaion) +Let ``ν_k = X_{k+1} - $(_math(:vector_transport, :symbol, "p_{k+1}", "p_k"))X_k``, +where ``$(_math(:vector_transport, :symbol))`` denotes a vector transport. -````math -β_k = \frac{\langle X_{k+1}, \nu_k \rangle_{p_{k+1}} } - { \langle P_{p_{k+1}\gets p_k} \delta_k, \nu_k\rangle_{p_{k+1}} }, -```` +Then the coefficient reads +```math +β_k = $(_tex(:Bigl))⟨ν_k - $(_tex( + :frac, + "2$(_tex(:norm, "ν_k"; index="p_{k+1}"))^2", + "⟨$(_math(:vector_transport, :symbol, "p_{k+1}", "p_k"))δ_k, ν_k⟩_{p_{k+1}}", + )) + $(_math(:vector_transport, :symbol, "p_{k+1}", "p_k"))δ_k, + $(_tex(:frac, "X_{k+1}", "⟨$(_math(:vector_transport, :symbol, "p_{k+1}", "p_k"))δ_k, ν_k⟩_{p_{k+1}}")) +$(_tex(:Bigr))⟩_{p_{k+1}}. +``` + +This method includes a numerical stability proposed by those authors. + +# Keyword arguments + +$(_var(:Keyword, :vector_transport_method)) + +$(_note(:ManifoldDefaultFactory, "HagerZhangCoefficientRule")) +""" +function HagerZhangCoefficient(args...; kwargs...) + return ManifoldDefaultsFactory(Manopt.HagerZhangCoefficientRule, args...; kwargs...) +end + +@doc """ + HestenesStiefelCoefficientRuleRule <: DirectionUpdateRule + +A functor `(problem, state, k) -> β_k` to compute the conjugate gradient update coefficient based on [HestenesStiefel:1952](@cite) adapted to manifolds -where ``P_{a\gets b}(⋅)`` denotes a vector transport from the tangent space at ``a`` to ``b``. +# Fields + +$(_var(:Field, :vector_transport_method)) # Constructor - function HestenesStiefelCoefficient(transport_method::AbstractVectorTransportMethod) - function HestenesStiefelCoefficient(M::AbstractManifold = DefaultManifold(2)) -Construct the Heestens Stiefel coefficient update rule, where the parallel transport is the -default vector transport and a new storage is created by default. + HestenesStiefelCoefficientRuleRule(M::AbstractManifold; kwargs...) -See also [`conjugate_gradient_descent`](@ref) +Construct the Hestenes-Stiefel coefficient update rule based on [HestenesStiefel:1952](@cite) adapted to manifolds. + +# Keyword arguments + +$(_var(:Keyword, :vector_transport_method)) + +# See also + +[`HestenesStiefelCoefficient`](@ref), [`conjugate_gradient_descent`](@ref) """ -struct HestenesStiefelCoefficient{TVTM<:AbstractVectorTransportMethod} <: +struct HestenesStiefelCoefficientRule{VTM<:AbstractVectorTransportMethod} <: DirectionUpdateRule - transport_method::TVTM - function HestenesStiefelCoefficient(t::AbstractVectorTransportMethod) - return new{typeof(t)}(t) - end + vector_transport_method::VTM end -function HestenesStiefelCoefficient(M::AbstractManifold=DefaultManifold(2)) - return HestenesStiefelCoefficient(default_vector_transport_method(M)) +function HestenesStiefelCoefficientRule( + M::AbstractManifold; vector_transport_method::VTM=default_vector_transport_method(M) +) where {VTM<:AbstractVectorTransportMethod} + return HestenesStiefelCoefficientRule{VTM}(vector_transport_method) end -update_rule_storage_points(::HestenesStiefelCoefficient) = Tuple{:Iterate} -update_rule_storage_vectors(::HestenesStiefelCoefficient) = Tuple{:Gradient,:δ} +update_rule_storage_points(::HestenesStiefelCoefficientRule) = Tuple{:Iterate} +update_rule_storage_vectors(::HestenesStiefelCoefficientRule) = Tuple{:Gradient,:δ} -function (u::DirectionUpdateRuleStorage{<:HestenesStiefelCoefficient})( +function (u::DirectionUpdateRuleStorage{<:HestenesStiefelCoefficientRule})( amp::AbstractManoptProblem, cgs::ConjugateGradientDescentState, i ) M = get_manifold(amp) @@ -441,60 +523,91 @@ function (u::DirectionUpdateRuleStorage{<:HestenesStiefelCoefficient})( X_old = get_storage(u.storage, VectorStorageKey(:Gradient)) δ_old = get_storage(u.storage, VectorStorageKey(:δ)) - gradienttr = vector_transport_to(M, p_old, X_old, cgs.p, u.coefficient.transport_method) - δtr = vector_transport_to(M, p_old, δ_old, cgs.p, u.coefficient.transport_method) + gradienttr = vector_transport_to( + M, p_old, X_old, cgs.p, u.coefficient.vector_transport_method + ) + δtr = vector_transport_to(M, p_old, δ_old, cgs.p, u.coefficient.vector_transport_method) ν = cgs.X - gradienttr #notation from [HZ06] β = inner(M, cgs.p, cgs.X, ν) / inner(M, cgs.p, δtr, ν) update_storage!(u.storage, amp, cgs) return max(0, β) end -function show(io::IO, u::HestenesStiefelCoefficient) - return print(io, "HestenesStiefelCoefficient($(u.transport_method))") +function show(io::IO, u::HestenesStiefelCoefficientRule) + return print( + io, + "Manopt.HestenesStiefelCoefficientRule(; vector_transport_method=$(u.vector_transport_method))", + ) end -@doc raw""" - LiuStoreyCoefficient <: DirectionUpdateRule +""" + HestenesStiefelCoefficient(; kwargs...) + HestenesStiefelCoefficient(M::AbstractManifold; kwargs...) -Computes an update coefficient for the conjugate gradient method, where -the [`ConjugateGradientDescentState`](@ref)` cgds` include the last iterates -``p_k,X_k``, the current iterates ``p_{k+1},X_{k+1}`` of the iterate and the gradient, respectively, -and the last update direction ``\delta=\delta_k``, based on [LiuStorey:1991](@cite) -adapted to manifolds: +Computes an update coefficient for the [`conjugate_gradient_descent`](@ref) algorithm based on [HestenesStiefel:1952](@cite) adapted to manifolds -Let ``\nu_k = X_{k+1} - P_{p_{k+1}\gets p_k}X_k``, -where ``P_{a\gets b}(⋅)`` denotes a vector transport from the tangent space at ``a`` to ``b``. + +$(_doc_CG_notaion) +Let ``ν_k = X_{k+1} - $(_math(:vector_transport, :symbol, "p_{k+1}", "p_k"))X_k``, +where ``$(_math(:vector_transport, :symbol))`` denotes a vector transport. Then the coefficient reads -````math -β_k = - -\frac{ \langle X_{k+1},\nu_k \rangle_{p_{k+1}} } -{\langle \delta_k,X_k \rangle_{p_k}}. -```` +```math +β_k = $(_tex( + :frac, + "⟨ X_{k+1}, ν_k ⟩_{p_{k+1}}", + "⟨ $(_math(:vector_transport, :symbol, "p_{k+1}", "p_k"))δ_k, ν_k⟩_{p_{k+1}}", +)). +``` -See also [`conjugate_gradient_descent`](@ref) +# Keyword arguments + +$(_var(:Keyword, :vector_transport_method)) + +$(_note(:ManifoldDefaultFactory, "HestenesStiefelCoefficientRule")) +""" +function HestenesStiefelCoefficient(args...; kwargs...) + return ManifoldDefaultsFactory( + Manopt.HestenesStiefelCoefficientRule, args...; kwargs... + ) +end + +@doc """ + LiuStoreyCoefficientRule <: DirectionUpdateRule + +A functor `(problem, state, k) -> β_k` to compute the conjugate gradient update coefficient based on [LiuStorey:1991](@cite) adapted to manifolds + +# Fields + +$(_var(:Field, :vector_transport_method)) # Constructor - function LiuStoreyCoefficient(t::AbstractVectorTransportMethod) - function LiuStoreyCoefficient(M::AbstractManifold = DefaultManifold(2)) -Construct the Lui Storey coefficient update rule, where the parallel transport is the -default vector transport and a new storage is created by default. + LiuStoreyCoefficientRule(M::AbstractManifold; kwargs...) + +Construct the Lui-Storey coefficient update rule based on [LiuStorey:1991](@cite) adapted to manifolds. + +# Keyword arguments + +$(_var(:Keyword, :vector_transport_method)) + +# See also + +[`LiuStoreyCoefficient`](@ref), [`conjugate_gradient_descent`](@ref) """ -struct LiuStoreyCoefficient{TVTM<:AbstractVectorTransportMethod} <: DirectionUpdateRule - transport_method::TVTM - function LiuStoreyCoefficient(t::AbstractVectorTransportMethod) - return new{typeof(t)}(t) - end +struct LiuStoreyCoefficientRule{VTM<:AbstractVectorTransportMethod} <: DirectionUpdateRule + vector_transport_method::VTM end -function LiuStoreyCoefficient(M::AbstractManifold=DefaultManifold(2)) - return LiuStoreyCoefficient(default_vector_transport_method(M)) +function LiuStoreyCoefficientRule( + M::AbstractManifold; vector_transport_method::VTM=default_vector_transport_method(M) +) where {VTM<:AbstractVectorTransportMethod} + return LiuStoreyCoefficientRule{VTM}(vector_transport_method) end -update_rule_storage_points(::LiuStoreyCoefficient) = Tuple{:Iterate} -update_rule_storage_vectors(::LiuStoreyCoefficient) = Tuple{:Gradient,:δ} +update_rule_storage_points(::LiuStoreyCoefficientRule) = Tuple{:Iterate} +update_rule_storage_vectors(::LiuStoreyCoefficientRule) = Tuple{:Gradient,:δ} -function (u::DirectionUpdateRuleStorage{<:LiuStoreyCoefficient})( +function (u::DirectionUpdateRuleStorage{<:LiuStoreyCoefficientRule})( amp::AbstractManoptProblem, cgs::ConjugateGradientDescentState, i ) M = get_manifold(amp) @@ -506,62 +619,82 @@ function (u::DirectionUpdateRuleStorage{<:LiuStoreyCoefficient})( p_old = get_storage(u.storage, PointStorageKey(:Iterate)) X_old = get_storage(u.storage, VectorStorageKey(:Gradient)) δ_old = get_storage(u.storage, VectorStorageKey(:δ)) - gradienttr = vector_transport_to(M, p_old, X_old, cgs.p, u.coefficient.transport_method) + gradienttr = vector_transport_to( + M, p_old, X_old, cgs.p, u.coefficient.vector_transport_method + ) ν = cgs.X - gradienttr # notation y from [HZ06] coef = inner(M, cgs.p, cgs.X, ν) / inner(M, p_old, -δ_old, X_old) update_storage!(u.storage, amp, cgs) return coef end -function show(io::IO, u::LiuStoreyCoefficient) - return print(io, "LiuStoreyCoefficient($(u.transport_method))") +function show(io::IO, u::LiuStoreyCoefficientRule) + return print( + io, + "Manopt.LiuStoreyCoefficientRule(; vector_transport_method=$(u.vector_transport_method))", + ) end -@doc raw""" - PolakRibiereCoefficient <: DirectionUpdateRule +""" + LiuStoreyCoefficient(; kwargs...) + LiuStoreyCoefficient(M::AbstractManifold; kwargs...) -Computes an update coefficient for the conjugate gradient method, where -the [`ConjugateGradientDescentState`](@ref)` cgds` include the last iterates -``p_k,X_k``, the current iterates ``p_{k+1},X_{k+1}`` of the iterate and the gradient, respectively, -and the last update direction ``\delta=\delta_k``, based on [PolakRibiere:1969](@cite) -and [Polyak:1969](@cite) adapted to manifolds: +Computes an update coefficient for the [`conjugate_gradient_descent`](@ref) algorithm based on [LiuStorey:1991](@cite) adapted to manifolds -Let ``\nu_k = X_{k+1} - P_{p_{k+1}\gets p_k}X_k``, -where ``P_{a\gets b}(⋅)`` denotes a vector transport from the tangent space at ``a`` to ``b``. +$(_doc_CG_notaion) +Let ``ν_k = X_{k+1} - $(_math(:vector_transport, :symbol, "p_{k+1}", "p_k"))X_k``, +where ``$(_math(:vector_transport, :symbol))`` denotes a vector transport. -Then the update reads +Then the coefficient reads +```math +β_k = - $(_tex(:frac, "⟨ X_{k+1},ν_k ⟩_{p_{k+1}}", "⟨ δ_k,X_k ⟩_{p_k}")). +``` -````math -β_k = -\frac{ \langle X_{k+1}, \nu_k \rangle_{p_{k+1}} } -{\lVert X_k \rVert_{p_k}^2 }. -```` +# Keyword arguments + +$(_var(:Keyword, :vector_transport_method)) + +$(_note(:ManifoldDefaultFactory, "LiuStoreyCoefficientRule")) +""" +function LiuStoreyCoefficient(args...; kwargs...) + return ManifoldDefaultsFactory(Manopt.LiuStoreyCoefficientRule, args...; kwargs...) +end + +@doc """ + PolakRibiereCoefficientRule <: DirectionUpdateRule + +A functor `(problem, state, k) -> β_k` to compute the conjugate gradient update coefficient based on [PolakRibiere:1969](@cite) adapted to manifolds + +# Fields + +$(_var(:Field, :vector_transport_method)) # Constructor - function PolakRibiereCoefficient( - M::AbstractManifold=DefaultManifold(2); - t::AbstractVectorTransportMethod=default_vector_transport_method(M) - ) + PolakRibiereCoefficientRule(M::AbstractManifold; kwargs...) -Construct the PolakRibiere coefficient update rule, where the parallel transport is the -default vector transport and a new storage is created by default. +Construct the Dai—Yuan coefficient update rule. -See also [`conjugate_gradient_descent`](@ref) +# Keyword arguments + +$(_var(:Keyword, :vector_transport_method)) + +# See also +[`PolakRibiereCoefficient`](@ref), [`conjugate_gradient_descent`](@ref) """ -struct PolakRibiereCoefficient{TVTM<:AbstractVectorTransportMethod} <: DirectionUpdateRule - transport_method::TVTM - function PolakRibiereCoefficient(t::AbstractVectorTransportMethod) - return new{typeof(t)}(t) - end +struct PolakRibiereCoefficientRule{VTM<:AbstractVectorTransportMethod} <: + DirectionUpdateRule + vector_transport_method::VTM end -function PolakRibiereCoefficient(M::AbstractManifold=DefaultManifold(2)) - return PolakRibiereCoefficient(default_vector_transport_method(M)) +function PolakRibiereCoefficientRule( + M::AbstractManifold; vector_transport_method::VTM=default_vector_transport_method(M) +) where {VTM<:AbstractVectorTransportMethod} + return PolakRibiereCoefficientRule{VTM}(vector_transport_method) end -update_rule_storage_points(::PolakRibiereCoefficient) = Tuple{:Iterate} -update_rule_storage_vectors(::PolakRibiereCoefficient) = Tuple{:Gradient} +update_rule_storage_points(::PolakRibiereCoefficientRule) = Tuple{:Iterate} +update_rule_storage_vectors(::PolakRibiereCoefficientRule) = Tuple{:Gradient} -function (u::DirectionUpdateRuleStorage{<:PolakRibiereCoefficient})( +function (u::DirectionUpdateRuleStorage{<:PolakRibiereCoefficientRule})( amp::AbstractManoptProblem, cgs::ConjugateGradientDescentState, i ) M = get_manifold(amp) @@ -572,92 +705,168 @@ function (u::DirectionUpdateRuleStorage{<:PolakRibiereCoefficient})( p_old = get_storage(u.storage, PointStorageKey(:Iterate)) X_old = get_storage(u.storage, VectorStorageKey(:Gradient)) - gradienttr = vector_transport_to(M, p_old, X_old, cgs.p, u.coefficient.transport_method) + gradienttr = vector_transport_to( + M, p_old, X_old, cgs.p, u.coefficient.vector_transport_method + ) ν = cgs.X - gradienttr β = real(inner(M, cgs.p, cgs.X, ν)) / real(inner(M, p_old, X_old, X_old)) update_storage!(u.storage, amp, cgs) return max(zero(β), β) end -function show(io::IO, u::PolakRibiereCoefficient) - return print(io, "PolakRibiereCoefficient($(u.transport_method))") +function show(io::IO, u::PolakRibiereCoefficientRule) + return print( + io, + "Manopt.PolakRibiereCoefficientRule(; vector_transport_method=$(u.vector_transport_method))", + ) +end + +""" + PolakRibiereCoefficient(; kwargs...) + PolakRibiereCoefficient(M::AbstractManifold; kwargs...) + +Computes an update coefficient for the [`conjugate_gradient_descent`](@ref) algorithm based +on [PolakRibiere:1969](@cite) adapted to Riemannian manifolds. + +$(_doc_CG_notaion) +Let ``ν_k = X_{k+1} - $(_math(:vector_transport, :symbol, "p_{k+1}", "p_k"))X_k``, +where ``$(_math(:vector_transport, :symbol))`` denotes a vector transport. + +Then the coefficient reads + +```math +β_k = $(_tex(:frac, "⟨ X_{k+1}, ν_k ⟩_{p_{k+1}}", _tex(:norm, "X_k"; index="{p_k}")*"^2")). +``` + +# Keyword arguments + +$(_var(:Keyword, :vector_transport_method)) + +$(_note(:ManifoldDefaultFactory, "PolakRibiereCoefficientRule")) +""" +function PolakRibiereCoefficient(args...; kwargs...) + return ManifoldDefaultsFactory(Manopt.PolakRibiereCoefficientRule, args...; kwargs...) end @doc raw""" - SteepestDirectionUpdateRule <: DirectionUpdateRule + SteepestDescentCoefficientRule <: DirectionUpdateRule -The simplest rule to update is to have no influence of the last direction and -hence return an update ``β = 0`` for all [`ConjugateGradientDescentState`](@ref)` cgds` +A functor `(problem, state, k) -> β_k` to compute the conjugate gradient update coefficient +to obtain the steepest direction, that is ``β_k=0``. -See also [`conjugate_gradient_descent`](@ref) +# Constructor + + SteepestDescentCoefficientRule() + +Construct the steepest descent coefficient update rule. + +# See also +[`SteepestDescentCoefficient`](@ref), [`conjugate_gradient_descent`](@ref) """ -struct SteepestDirectionUpdateRule <: DirectionUpdateRule end +struct SteepestDescentCoefficientRule <: DirectionUpdateRule end -update_rule_storage_points(::SteepestDirectionUpdateRule) = Tuple{} -update_rule_storage_vectors(::SteepestDirectionUpdateRule) = Tuple{} +update_rule_storage_points(::SteepestDescentCoefficientRule) = Tuple{} +update_rule_storage_vectors(::SteepestDescentCoefficientRule) = Tuple{} -function (u::DirectionUpdateRuleStorage{SteepestDirectionUpdateRule})( +function (u::DirectionUpdateRuleStorage{SteepestDescentCoefficientRule})( ::DefaultManoptProblem, ::ConjugateGradientDescentState, i ) return 0.0 end +@doc """ + SteepestDescentCoefficient() + SteepestDescentCoefficient(M::AbstractManifold) -@doc raw""" - ConjugateGradientBealeRestart <: DirectionUpdateRule +Computes an update coefficient for the [`conjugate_gradient_descent`](@ref) algorithm +so that is falls back to a [`gradient_descent`](@ref) method, that is +````math +β_k = 0 +```` -An update rule might require a restart, that is using pure gradient as descent direction, -if the last two gradients are nearly orthogonal, see [HagerZhang:2006; page 12](@cite) (in the preprint, page 46 in Journal page numbers). -This method is named after E. Beale from his proceedings paper in 1972 [Beale:1972](@cite). -This method acts as a _decorator_ to any existing [`DirectionUpdateRule`](@ref) `direction_update`. +$(_note(:ManifoldDefaultFactory, "SteepestDescentCoefficient")) +""" +function SteepestDescentCoefficient() + return ManifoldDefaultsFactory( + Manopt.SteepestDescentCoefficientRule; requires_manifold=false + ) +end -When obtain from the [`ConjugateGradientDescentState`](@ref)` cgs` the last -``p_k,X_k`` and the current ``p_{k+1},X_{k+1}`` iterate and the gradient, respectively. +@doc """ + ConjugateGradientBealeRestartRule <: DirectionUpdateRule -Then a restart is performed, hence ``β_k = 0`` returned if +A functor `(problem, state, k) -> β_k` to compute the conjugate gradient update coefficient +based on a restart idea of [Beale:1972](@cite), following [HagerZhang:2006; page 12](@cite) +adapted to manifolds. -```math - \frac{ ⟨X_{k+1}, P_{p_{k+1}\gets p_k}X_k⟩}{\lVert X_k \rVert_{p_k}} > ξ, -``` -where ``P_{a\gets b}(⋅)`` denotes a vector transport from the tangent space at ``a`` to ``b``, -and ``ξ`` is the `threshold`. -The default threshold is chosen as `0.2` as recommended in [Powell:1977](@cite) +# Fields + +* `direction_update::DirectionUpdateRule`: the actual rule, that is restarted +* `threshold::Real`: a threshold for the restart check. +$(_var(:Field, :vector_transport_method)) # Constructor - ConjugateGradientBealeRestart( - direction_update::D, - threshold=0.2; - manifold::AbstractManifold = DefaultManifold(), - vector_transport_method::V=default_vector_transport_method(manifold), + ConjugateGradientBealeRestartRule( + direction_update::Union{DirectionUpdateRule,ManifoldDefaultsFactory}; + kwargs... + ) + ConjugateGradientBealeRestartRule( + M::AbstractManifold=DefaultManifold(), + direction_update::Union{DirectionUpdateRule,ManifoldDefaultsFactory}; + kwargs... ) + +Construct the Beale restart coefficient update rule adapted to manifolds. + +## Input + +$(_var(:Argument, :M; type=true)) + If this is not provided, the `DefaultManifold()` from $(_link(:ManifoldsBase)) is used. +* `direction_update`: a [`DirectionUpdateRule`](@ref) or a corresponding + [`ManifoldDefaultsFactory`](@ref) to produce such a rule. + +## Keyword arguments + +$(_var(:Keyword, :vector_transport_method)) +* `threshold=0.2` + +# See also + +[`ConjugateGradientBealeRestart`](@ref), [`conjugate_gradient_descent`](@ref) """ -mutable struct ConjugateGradientBealeRestart{ - DUR<:DirectionUpdateRule,VT<:AbstractVectorTransportMethod,F +mutable struct ConjugateGradientBealeRestartRule{ + DUR<:DirectionUpdateRule,VT<:AbstractVectorTransportMethod,F<:Real } <: DirectionUpdateRule direction_update::DUR threshold::F vector_transport_method::VT - function ConjugateGradientBealeRestart( - direction_update::D, - threshold=0.2; - manifold::AbstractManifold=DefaultManifold(), - vector_transport_method::V=default_vector_transport_method(manifold), - ) where {D<:DirectionUpdateRule,V<:AbstractVectorTransportMethod} - return new{D,V,typeof(threshold)}( - direction_update, threshold, vector_transport_method - ) - end +end +function ConjugateGradientBealeRestartRule( + M::AbstractManifold, + direction_update::Union{DirectionUpdateRule,ManifoldDefaultsFactory}; + threshold::F=0.2, + vector_transport_method::V=default_vector_transport_method(M), +) where {V<:AbstractVectorTransportMethod,F<:Real} + dir = _produce_type(direction_update, M) + return ConjugateGradientBealeRestartRule{typeof(dir),V,F}( + dir, threshold, vector_transport_method + ) +end +function ConjugateGradientBealeRestartRule( + direction_update::Union{DirectionUpdateRule,ManifoldDefaultsFactory}; kwargs... +) + return ConjugateGradientBealeRestartRule(DefaultManifold(), direction_update; kwargs...) end -@inline function update_rule_storage_points(dur::ConjugateGradientBealeRestart) +@inline function update_rule_storage_points(dur::ConjugateGradientBealeRestartRule) dur_p = update_rule_storage_points(dur.direction_update) return :Iterate in dur_p.parameters ? dur_p : Tuple{:Iterate,dur_p.parameters...} end -@inline function update_rule_storage_vectors(dur::ConjugateGradientBealeRestart) +@inline function update_rule_storage_vectors(dur::ConjugateGradientBealeRestartRule) dur_X = update_rule_storage_vectors(dur.direction_update) return :Gradient in dur_X.parameters ? dur_X : Tuple{:Gradient,dur_X.parameters...} end -function (u::DirectionUpdateRuleStorage{<:ConjugateGradientBealeRestart})( +function (u::DirectionUpdateRuleStorage{<:ConjugateGradientBealeRestartRule})( amp::AbstractManoptProblem, cgs::ConjugateGradientDescentState, k ) M = get_manifold(amp) @@ -680,9 +889,49 @@ function (u::DirectionUpdateRuleStorage{<:ConjugateGradientBealeRestart})( update_storage!(u.storage, amp, cgs) return real(num / denom) > u.coefficient.threshold ? zero(β) : β end -function show(io::IO, u::ConjugateGradientBealeRestart) +function show(io::IO, u::ConjugateGradientBealeRestartRule) return print( io, - "ConjugateGradientBealeRestart($(u.direction_update), $(u.threshold); vector_transport_method=$(u.vector_transport_method))", + "Manopt.ConjugateGradientBealeRestartRule($(repr(u.direction_update)); threshold=$(u.threshold), vector_transport_method=$(u.vector_transport_method))", + ) +end + +""" + ConjugateGradientBealeRestart(direction_update::Union{DirectionUpdateRule,ManifoldDefaultsFactory}; kwargs...) + ConjugateGradientBealeRestart(M::AbstractManifold, direction_update::Union{DirectionUpdateRule,ManifoldDefaultsFactory}; kwargs...) + +Compute a conjugate gradient coefficient with a potential restart, when two directions are +nearly orthogonal. See [HagerZhang:2006; page 12](@cite) (in the preprint, page 46 in Journal page numbers). +This method is named after E. Beale from his proceedings paper in 1972 [Beale:1972](@cite). +This method acts as a _decorator_ to any existing [`DirectionUpdateRule`](@ref) `direction_update`. + +$(_doc_CG_notaion) + +Then a restart is performed, hence ``β_k = 0`` returned if + +```math + $(_tex( + :frac, + "⟨X_{k+1}, $(_math(:vector_transport, :symbol, "p_{k+1}", "p_k"))X_k⟩", + _tex(:norm, "X_k", index="p_k") + )) > ε, +``` +where ``ε`` is the `threshold`, which is set by default to `0.2`, see [Powell:1977](@cite) + +## Input + +* `direction_update`: a [`DirectionUpdateRule`](@ref) or a corresponding + [`ManifoldDefaultsFactory`](@ref) to produce such a rule. + +## Keyword arguments + +$(_var(:Keyword, :vector_transport_method)) +* `threshold=0.2` + +$(_note(:ManifoldDefaultFactory, "ConjugateGradientBealeRestartRule")) +""" +function ConjugateGradientBealeRestart(args...; kwargs...) + return ManifoldDefaultsFactory( + Manopt.ConjugateGradientBealeRestartRule, args...; kwargs... ) end diff --git a/src/plans/conjugate_residual_plan.jl b/src/plans/conjugate_residual_plan.jl index 4b405fdfe8..d78f6c7769 100644 --- a/src/plans/conjugate_residual_plan.jl +++ b/src/plans/conjugate_residual_plan.jl @@ -3,7 +3,7 @@ # Objective. _doc_CR_cost = """ ```math -f(X) = $(_l_frac(1,2)) $(_l_norm(_l_cal("A")*"[X] + b","p"))^2,\\qquad X ∈ $(_l_TpM()), +f(X) = $(_tex(:frac, 1,2)) $(_tex(:norm, _tex(:Cal, "A")*"[X] + b"; index="p"))^2,\\qquad X ∈ $(_math(:TpM)), ``` """ @doc """ @@ -13,9 +13,9 @@ Model the objective $(_doc_CR_cost) -defined on the tangent space ``$(_l_TpM)`` at ``p`` on the manifold ``$(_l_M)``. +defined on the tangent space ``$(_math(:TpM))`` at ``p`` on the manifold ``$(_math(:M))``. -In other words this is an objective to solve ``$(_l_cal("A")) = -b(p)`` +In other words this is an objective to solve ``$(_tex(:Cal, "A")) = -b(p)`` for some linear symmetric operator and a vector function. Note the minus on the right hand side, which makes this objective especially tailored for (iteratively) solving Newton-like equations. @@ -47,9 +47,9 @@ function SymmetricLinearSystemObjective( return SymmetricLinearSystemObjective{E,TA,T}(A, b) end -function set_manopt_parameter!(slso::SymmetricLinearSystemObjective, symbol::Symbol, value) - set_manopt_parameter!(slso.A!!, symbol, value) - set_manopt_parameter!(slso.b!!, symbol, value) +function set_parameter!(slso::SymmetricLinearSystemObjective, symbol::Symbol, value) + set_parameter!(slso.A!!, symbol, value) + set_parameter!(slso.b!!, symbol, value) return slso end @@ -84,7 +84,7 @@ end @doc """ get_b(TpM::TangentSpace, slso::SymmetricLinearSystemObjective) -evaluate the stored value for computing the right hand side ``b`` in ``$(_l_cal("A"))=-b``. +evaluate the stored value for computing the right hand side ``b`` in ``$(_tex(:Cal, "A"))=-b``. """ function get_b( TpM::TangentSpace, slso::SymmetricLinearSystemObjective{AllocatingEvaluation} @@ -106,7 +106,7 @@ evaluate the gradient of $(_doc_CR_cost) -Which is ``$(_l_grad) f(X) = $(_l_cal("A"))[X]+b``. This can be computed in-place of `Y`. +Which is ``$(_tex(:grad)) f(X) = $(_tex(:Cal, "A"))[X]+b``. This can be computed in-place of `Y`. """ function get_gradient(TpM::TangentSpace, slso::SymmetricLinearSystemObjective, X) p = base_point(TpM) @@ -141,7 +141,7 @@ evaluate the Hessian of $(_doc_CR_cost) -Which is ``$(_l_Hess) f(X)[Y] = $(_l_cal("A"))[V]``. This can be computed in-place of `W`. +Which is ``$(_tex(:Hess)) f(X)[Y] = $(_tex(:Cal, "A"))[V]``. This can be computed in-place of `W`. """ function get_hessian( TpM::TangentSpace, slso::SymmetricLinearSystemObjective{AllocatingEvaluation}, X, V @@ -179,32 +179,30 @@ A state for the [`conjugate_residual`](@ref) solver. # Fields * `X::T`: the iterate -* `r::T`: the residual ``r = -b(p) - $(_l_cal("A"))(p)[X]`` +* `r::T`: the residual ``r = -b(p) - $(_tex(:Cal, "A"))(p)[X]`` * `d::T`: the conjugate direction -* `Ar::T`, `Ad::T`: storages for ``$(_l_cal("A"))(p)[d]``, ``$(_l_cal("A"))(p)[r]`` -* `rAr::R`: internal field for storing ``⟨ r, $(_l_cal("A"))(p)[r] ⟩`` +* `Ar::T`, `Ad::T`: storages for ``$(_tex(:Cal, "A"))(p)[d]``, ``$(_tex(:Cal, "A"))(p)[r]`` +* `rAr::R`: internal field for storing ``⟨ r, $(_tex(:Cal, "A"))(p)[r] ⟩`` * `α::R`: a step length * `β::R`: the conjugate coefficient -* `stop::TStop`: a [`StoppingCriterion`](@ref) for the solver +$(_var(:Field, :stopping_criterion, "stop")) # Constructor - function ConjugateResidualState(TpM::TangentSpace,slso::SymmetricLinearSystemObjective; - kwargs... - ) + ConjugateResidualState(TpM::TangentSpace,slso::SymmetricLinearSystemObjective; kwargs...) Initialise the state with default values. ## Keyword arguments -* `X``$(_link_rand("TpM"))` -* `r=-get_gradient(TpM, slso, X)` +* `r=-`[`get_gradient`](@ref)`(TpM, slso, X)` * `d=copy(TpM, r)` -* `Ar=get_hessian(TpM, slso, X, r)` +* `Ar=`[`get_hessian`](@ref)`(TpM, slso, X, r)` * `Ad=copy(TpM, Ar)` * `α::R=0.0` * `β::R=0.0` -* `stopping_criterion=`[`StopAfterIteration`](@ref)`($(_link_manifold_dimension()))`$(_sc_any)[`StopWhenGradientNormLess`](@ref)`(1e-8)` +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(`$(_link(:manifold_dimension))`)`$(_sc(:Any))[`StopWhenGradientNormLess`](@ref)`(1e-8)`")) +$(_var(:Keyword, :X)) # See also @@ -292,15 +290,15 @@ Stop when re relative residual in the [`conjugate_residual`](@ref) is below a certain threshold, i.e. ```math -$(_l_ds)$(_l_frac(_l_norm("r^{(k)"),"c")) ≤ ε, +$(_tex(:displaystyle))$(_tex(:frac, _tex(:norm, "r^{(k)"),"c")) ≤ ε, ``` -where ``c = $(_l_norm("b"))`` of the initial vector from the vector field in ``$(_l_cal("A"))(p)[X] + b(p) = 0_p``, +where ``c = $(_tex(:norm, "b"))`` of the initial vector from the vector field in ``$(_tex(:Cal, "A"))(p)[X] + b(p) = 0_p``, from the [`conjugate_residual`](@ref) # Fields -$(_field_at_iteration) +$(_var(:Field, :at_iteration)) * `c`: the initial norm * `ε`: the threshold * `norm_rk`: the last computed norm of the residual @@ -313,7 +311,7 @@ Initialise the stopping criterion. !!! note - The initial norm of the vector field ``c = $(_l_norm("b"))`` + The initial norm of the vector field ``c = $(_tex(:norm, "b"))`` that is stored internally is updated on initialisation, that is, if this stopping criterion is called with `k<=0`. """ diff --git a/src/plans/constrained_plan.jl b/src/plans/constrained_plan.jl index 5ed0a5869e..f66ce1396e 100644 --- a/src/plans/constrained_plan.jl +++ b/src/plans/constrained_plan.jl @@ -9,20 +9,16 @@ constraintsnof type `T`. """ abstract type AbstractConstrainedFunctor{T} end -function set_manopt_parameter!( - acf::AbstractConstrainedFunctor{T}, ::Val{:μ}, μ::T -) where {T} +function set_parameter!(acf::AbstractConstrainedFunctor{T}, ::Val{:μ}, μ::T) where {T} acf.μ = μ return acf end -get_manopt_parameter(acf::AbstractConstrainedFunctor, ::Val{:μ}) = acf.μ -function set_manopt_parameter!( - acf::AbstractConstrainedFunctor{T}, ::Val{:λ}, λ::T -) where {T} +get_parameter(acf::AbstractConstrainedFunctor, ::Val{:μ}) = acf.μ +function set_parameter!(acf::AbstractConstrainedFunctor{T}, ::Val{:λ}, λ::T) where {T} acf.λ = λ return acf end -get_manopt_parameter(acf::AbstractConstrainedFunctor, ::Val{:λ}) = acf.λ +get_parameter(acf::AbstractConstrainedFunctor, ::Val{:λ}) = acf.λ """ AbstractConstrainedSlackFunctor{T,R} @@ -37,27 +33,23 @@ which is also of typee `T`. """ abstract type AbstractConstrainedSlackFunctor{T,R} end -function set_manopt_parameter!( - acsf::AbstractConstrainedSlackFunctor{T}, ::Val{:s}, s::T -) where {T} +function set_parameter!(acsf::AbstractConstrainedSlackFunctor{T}, ::Val{:s}, s::T) where {T} acsf.s = s return acsf end -get_manopt_parameter(acsf::AbstractConstrainedSlackFunctor, ::Val{:s}) = acsf.s -function set_manopt_parameter!( - acsf::AbstractConstrainedSlackFunctor{T}, ::Val{:μ}, μ::T -) where {T} +get_parameter(acsf::AbstractConstrainedSlackFunctor, ::Val{:s}) = acsf.s +function set_parameter!(acsf::AbstractConstrainedSlackFunctor{T}, ::Val{:μ}, μ::T) where {T} acsf.μ = μ return acsf end -get_manopt_parameter(acsf::AbstractConstrainedSlackFunctor, ::Val{:μ}) = acsf.μ -function set_manopt_parameter!( +get_parameter(acsf::AbstractConstrainedSlackFunctor, ::Val{:μ}) = acsf.μ +function set_parameter!( acsf::AbstractConstrainedSlackFunctor{T,R}, ::Val{:β}, β::R ) where {T,R} acsf.β = β return acsf end -get_manopt_parameter(acsf::AbstractConstrainedSlackFunctor, ::Val{:β}) = acsf.β +get_parameter(acsf::AbstractConstrainedSlackFunctor, ::Val{:β}) = acsf.β @doc raw""" ConstrainedManifoldObjective{T<:AbstractEvaluationType, C<:ConstraintType} <: AbstractManifoldObjective{T} @@ -579,24 +571,6 @@ within the [`ConstrainedManifoldObjective`](@ref). """ get_unconstrained_objective(co::ConstrainedManifoldObjective) = co.objective -function get_constraints(mp::AbstractManoptProblem, p) - Base.depwarn( - "get_constraints will be removed in a future release, use `get_equality_constraint($mp, $p, :)` and `get_inequality_constraint($mp, $p, :)`, respectively", - :get_constraints, - ) - return [ - get_inequality_constraint(get_manifold(mp), get_objective(mp), p, :), - get_equality_constraint(get_manifold(mp), get_objective(mp), p, :), - ] -end -function get_constraints(M::AbstractManifold, co::ConstrainedManifoldObjective, p) - Base.depwarn( - "get_constraints will be removed in a future release, use `get_equality_constraint($M, $co, $p, :)` and `get_inequality_constraint($M, $co, $p, :)`, respectively", - :get_constraints, - ) - return [get_inequality_constraint(M, co, p, :), get_equality_constraint(M, co, p, :)] -end - function get_cost(M::AbstractManifold, co::ConstrainedManifoldObjective, p) return get_cost(M, co.objective, p) end @@ -604,22 +578,6 @@ function get_cost_function(co::ConstrainedManifoldObjective, recursive=false) return get_cost_function(co.objective, recursive) end -Base.@deprecate get_equality_constraints(amp::AbstractManoptProblem, p) get_equality_constraint( - amp, p, :, -) - -Base.@deprecate get_equality_constraints!(amp::AbstractManoptProblem, X, p) get_equality_constraint!( - amp, X, p, :, -) - -Base.@deprecate get_equality_constraints( - M::AbstractManifold, co::AbstractManifoldObjective, p -) get_equality_constraint(M, co, p, :) - -Base.@deprecate get_equality_constraints!( - M::AbstractManifold, X, co::AbstractManifoldObjective, p -) get_equality_constraint!(M, X, co, p, :) - @doc raw""" get_equality_constraint(amp::AbstractManoptProblem, p, j=:) get_equality_constraint(M::AbstractManifold, objective, p, j=:) @@ -659,13 +617,6 @@ function get_gradient_function(co::ConstrainedManifoldObjective, recursive=false return get_gradient_function(co.objective, recursive) end -Base.@deprecate get_inequality_constraints(amp::AbstractManoptProblem, p) get_inequality_constraint( - amp, p, :, -) -Base.@deprecate get_inequality_constraints( - M::AbstractManifold, co::AbstractManifoldObjective, p -) get_inequality_constraint(M, co, p, :) - @doc raw""" get_inequality_constraint(amp::AbstractManoptProblem, p, j=:) get_inequality_constraint(M::AbstractManifold, co::ConstrainedManifoldObjective, p, j=:, range=NestedPowerRepresentation()) @@ -772,20 +723,6 @@ function get_grad_equality_constraint!( return get_gradient!(M, X, co.equality_constraints, p, j, range) end -# Deprecate plurals -Base.@deprecate get_grad_equality_constraints(mp::AbstractManoptProblem, p) get_grad_equality_constraint( - mp, p, :, -) -Base.@deprecate get_grad_equality_constraints( - M::AbstractManifold, co::AbstractManifoldObjective, p -) get_grad_equality_constraint(M, co, p, :) -Base.@deprecate get_grad_equality_constraints!(mp::AbstractManoptProblem, X, p) get_grad_equality_constraint!( - mp, X, p, :, -) -Base.@deprecate get_grad_equality_constraints!( - M::AbstractManifold, X, co::AbstractManifoldObjective, p -) get_grad_equality_constraint!(M, X, co, p, :) - @doc raw""" get_grad_inequality_constraint(amp::AbstractManoptProblem, p, j=:) get_grad_inequality_constraint(M::AbstractManifold, co::ConstrainedManifoldObjective, p, j=:, range=NestedPowerRepresentation()) @@ -859,20 +796,6 @@ function get_grad_inequality_constraint!( return get_gradient!(M, X, co.inequality_constraints, p, j, range) end -#Deprecate plurals -Base.@deprecate get_grad_inequality_constraints(mp::AbstractManoptProblem, p) get_grad_inequality_constraint( - mp, p, :, -) -Base.@deprecate get_grad_inequality_constraints( - M::AbstractManifold, co::AbstractManifoldObjective, p -) get_grad_inequality_constraint(M, co, p, :) -Base.@deprecate get_grad_inequality_constraints!(mp::AbstractManoptProblem, X, p) get_grad_inequality_constraint!( - mp, X, p, :, -) -Base.@deprecate get_grad_inequality_constraints!( - M::AbstractManifold, X, co::AbstractManifoldObjective, p -) get_grad_inequality_constraint!(M, X, co, p, :) - function get_hessian(M::AbstractManifold, co::ConstrainedManifoldObjective, p, X) return get_hessian(M, co.objective, p, X) end diff --git a/src/plans/debug.jl b/src/plans/debug.jl index cf46b21d1a..298eb1d2b2 100644 --- a/src/plans/debug.jl +++ b/src/plans/debug.jl @@ -67,23 +67,26 @@ function DebugSolverState( end """ - set_manopt_parameter!(ams::DebugSolverState, ::Val{:Debug}, args...) + set_parameter!(ams::DebugSolverState, ::Val{:Debug}, args...) Set certain values specified by `args...` into the elements of the `debugDictionary` """ -function set_manopt_parameter!(dss::DebugSolverState, ::Val{:Debug}, args...) +function set_parameter!(dss::DebugSolverState, ::Val{:Debug}, args...) for d in values(dss.debugDictionary) - set_manopt_parameter!(d, args...) + set_parameter!(d, args...) end return dss end # all other pass through -function set_manopt_parameter!(dss::DebugSolverState, v::Val{T}, args...) where {T} - return set_manopt_parameter!(dss.state, v, args...) +function set_parameter!(dss::DebugSolverState, v::Val{T}, args...) where {T} + return set_parameter!(dss.state, v, args...) +end +function set_parameter!(dss::DebugSolverState, v::Val{:StoppingCriterion}, args...) + return set_parameter!(dss.state, v, args...) end # all other pass through -function get_manopt_parameter(dss::DebugSolverState, v::Val{T}, args...) where {T} - return get_manopt_parameter(dss.state, v, args...) +function get_parameter(dss::DebugSolverState, v::Val{T}, args...) where {T} + return get_parameter(dss.state, v, args...) end function status_summary(dst::DebugSolverState) @@ -136,14 +139,14 @@ function show(io::IO, dg::DebugGroup) s = join(["$(di)" for di in dg.group], ", ") return print(io, "DebugGroup([$s])") end -function set_manopt_parameter!(dg::DebugGroup, v::Val, args...) +function set_parameter!(dg::DebugGroup, v::Val, args...) for di in dg.group - set_manopt_parameter!(di, v, args...) + set_parameter!(di, v, args...) end return dg end -function set_manopt_parameter!(dg::DebugGroup, e::Symbol, args...) - set_manopt_parameter!(dg, Val(e), args...) +function set_parameter!(dg::DebugGroup, e::Symbol, args...) + set_parameter!(dg, Val(e), args...) return dg end @@ -183,7 +186,7 @@ function (d::DebugEvery)(p::AbstractManoptProblem, st::AbstractManoptSolverState d.debug(p, st, -1) end # set activity for this iterate in subsolvers - set_manopt_parameter!( + set_parameter!( st, :SubState, :Debug, @@ -207,12 +210,12 @@ function status_summary(de::DebugEvery) end return "[$s, $(de.every)]" end -function set_manopt_parameter!(de::DebugEvery, e::Symbol, args...) - set_manopt_parameter!(de, Val(e), args...) +function set_parameter!(de::DebugEvery, e::Symbol, args...) + set_parameter!(de, Val(e), args...) return de end -function set_manopt_parameter!(de::DebugEvery, args...) - set_manopt_parameter!(de.debug, args...) +function set_parameter!(de::DebugEvery, args...) + set_parameter!(de.debug, args...) return de end @@ -230,8 +233,7 @@ during the last iteration. See [`DebugEntryChange`](@ref) for the general case * `storage=`[`StoreStateAction`](@ref)`( [:Gradient] )` storage of the previous action * `prefix="Last Change:"`: prefix of the debug output (ignored if you set `format`) * `io=stdout`: default stream to print the debug to. -* $_kw_inverse_retraction_method_default: - $_kw_inverse_retraction_method +$(_var(:Keyword, :inverse_retraction_method)) the inverse retraction to be used for approximating distance. @@ -247,22 +249,12 @@ mutable struct DebugChange{IR<:AbstractInverseRetractionMethod} <: DebugAction io::IO=stdout, prefix::String="Last Change: ", format::String="$(prefix)%f", - manifold::Union{Nothing,AbstractManifold}=nothing, - invretr::Union{Nothing,AbstractInverseRetractionMethod}=nothing, inverse_retraction_method::AbstractInverseRetractionMethod=default_inverse_retraction_method( M ), ) irm = inverse_retraction_method # Deprecated, remove in Manopt 0.5 - if !isnothing(manifold) - @warn "The `manifold` keyword is deprecated, use the first positional argument `M`. This keyword for now sets `inverse_retracion_method`." - irm = default_inverse_retraction_method(manifold) - end - if !isnothing(invretr) - @warn "invretr keyword is deprecated, use `inverse_retraction_method`, which this one overrides for now." - irm = invretr - end if isnothing(storage) if M isa DefaultManifold storage = StoreStateAction(M; store_fields=[:Iterate]) @@ -844,11 +836,11 @@ end function status_summary(dwa::DebugWhenActive) return repr(dwa) end -function set_manopt_parameter!(dwa::DebugWhenActive, v::Val, args...) - set_manopt_parameter!(dwa.debug, v, args...) +function set_parameter!(dwa::DebugWhenActive, v::Val, args...) + set_parameter!(dwa.debug, v, args...) return dwa end -function set_manopt_parameter!(dwa::DebugWhenActive, ::Val{:Activity}, v) +function set_parameter!(dwa::DebugWhenActive, ::Val{:Activity}, v) return dwa.active = v end @@ -974,7 +966,7 @@ function (d::DebugWarnIfCostIncreases)( @warn """ You seem to be running a `gradient_descent` with a `ConstantStepsize`. Maybe consider to use `ArmijoLinesearch` (if applicable) or use - `ConstantStepsize(value)` with a `value` less than $(get_last_stepsize(p,st,k)). + `ConstantLength(value)` with a `value` less than $(get_last_stepsize(p,st,k)). """ end if d.status === :Once @@ -1171,20 +1163,26 @@ one are called with an `i=0` for reset. 1. Providing a simple vector of symbols, numbers and strings like - [:Iterate, " | ", :Cost, :Stop, 10] + ``` + [:Iterate, " | ", :Cost, :Stop, 10] + ``` -Adds a group to :Iteration of three actions ([`DebugIteration`](@ref), [`DebugDivider`](@ref)`(" | "), and[`DebugCost`](@ref)) -as a [`DebugGroup`](@ref) inside an [`DebugEvery`](@ref) to only be executed every 10th iteration. -It also adds the [`DebugStoppingCriterion`](@ref) to the `:EndAlgorhtm` entry of the dictionary. + Adds a group to :Iteration of three actions ([`DebugIteration`](@ref), [`DebugDivider`](@ref)`(" | "), and[`DebugCost`](@ref)) + as a [`DebugGroup`](@ref) inside an [`DebugEvery`](@ref) to only be executed every 10th iteration. + It also adds the [`DebugStoppingCriterion`](@ref) to the `:EndAlgorithm` entry of the dictionary. 2. The same can also be written a bit more precise as - DebugFactory([:Iteration => [:Iterate, " | ", :Cost, 10], :Stop]) + ``` + DebugFactory([:Iteration => [:Iterate, " | ", :Cost, 10], :Stop]) + ``` 3. We can even make the stoping criterion concrete and pass Actions directly, - for example explicitly Making the stop more concrete, we get + for example explicitly Making the stop more concrete, we get - DebugFactory([:Iteration => [:Iterate, " | ", DebugCost(), 10], :Stop => [:Stop]]) + ``` + DebugFactory([:Iteration => [:Iterate, " | ", DebugCost(), 10], :Stop => [:Stop]]) + ``` """ function DebugFactory(a::Vector{<:Any}) entries = filter(x -> !isa(x, Pair) && (x ∉ [:Stop, :WhenActive]) && !isa(x, Int), a) @@ -1225,14 +1223,14 @@ function DebugFactory(a::Vector{<:Any}) end @doc raw""" - DebugGroupFactory(a::Vector) + DebugGroupFactory(a::Vector) -Generate a [`DebugGroup`] of [`DebugAction`](@ref)s. The following rules are used +Generate a [`DebugGroup`](@ref) of [`DebugAction`](@ref)s. The following rules are used 1. Any `Symbol` is passed to [`DebugActionFactory`](@ref DebugActionFactory(::Symbol)) -2. Any `(Symbol, String)` generates similar actions as in 1., but the string is used for `format=``, - see [`DebugActionFactory`](@ref DebugActionFactory(::Tuple{Symbol,String})) -3. Any `String` is passed to `DebugActionFactory(d::String)`](@ref)` +2. Any `(Symbol, String)` generates similar actions as in 1., but the string is used for `format=`, + see [`DebugActionFactory`](@ref DebugActionFactory(::Tuple{Symbol,String})) +3. Any `String` is passed to [`DebugActionFactory`](@ref DebugActionFactory(d::String)) 4. Any [`DebugAction`](@ref) is included as is. If this results in more than one [`DebugAction`](@ref) a [`DebugGroup`](@ref) of these is build. diff --git a/src/plans/difference_of_convex_plan.jl b/src/plans/difference_of_convex_plan.jl index 84adff0898..1328d297c3 100644 --- a/src/plans/difference_of_convex_plan.jl +++ b/src/plans/difference_of_convex_plan.jl @@ -112,8 +112,8 @@ that are stored within this functor as well. * `Xk` a tangent vector at `pk` Both interim values can be set using -`set_manopt_parameter!(::LinearizedDCCost, ::Val{:p}, p)` -and `set_manopt_parameter!(::LinearizedDCCost, ::Val{:X}, X)`, respectively. +`set_parameter!(::LinearizedDCCost, ::Val{:p}, p)` +and `set_parameter!(::LinearizedDCCost, ::Val{:X}, X)`, respectively. # Constructor LinearizedDCCost(g, p, X) @@ -125,11 +125,11 @@ mutable struct LinearizedDCCost{P,T,TG} end (F::LinearizedDCCost)(M, p) = F.g(M, p) - inner(M, F.pk, F.Xk, log(M, F.pk, p)) -function set_manopt_parameter!(ldc::LinearizedDCCost, ::Val{:p}, p) +function set_parameter!(ldc::LinearizedDCCost, ::Val{:p}, p) ldc.pk .= p return ldc end -function set_manopt_parameter!(ldc::LinearizedDCCost, ::Val{:X}, X) +function set_parameter!(ldc::LinearizedDCCost, ::Val{:X}, X) ldc.Xk .= X return ldc end @@ -160,8 +160,8 @@ for a point `pk` and a tangent vector `Xk` at `pk` (the outer iterates) that are * `Xk` a tangent vector at `pk` Both interim values can be set using -`set_manopt_parameter!(::LinearizedDCGrad, ::Val{:p}, p)` -and `set_manopt_parameter!(::LinearizedDCGrad, ::Val{:X}, X)`, respectively. +`set_parameter!(::LinearizedDCGrad, ::Val{:p}, p)` +and `set_parameter!(::LinearizedDCGrad, ::Val{:X}, X)`, respectively. # Constructor LinearizedDCGrad(grad_g, p, X; evaluation=AllocatingEvaluation()) @@ -199,11 +199,11 @@ function (grad_f!::LinearizedDCGrad{InplaceEvaluation})(M, p) return X end -function set_manopt_parameter!(ldcg::LinearizedDCGrad, ::Val{:p}, p) +function set_parameter!(ldcg::LinearizedDCGrad, ::Val{:p}, p) ldcg.pk .= p return ldcg end -function set_manopt_parameter!(ldcg::LinearizedDCGrad, ::Val{:X}, X) +function set_parameter!(ldcg::LinearizedDCGrad, ::Val{:X}, X) ldcg.Xk .= X return ldcg end @@ -319,8 +319,8 @@ for a point `pk` and a proximal parameter ``λ``. * `λ` - the prox parameter Both interim values can be set using -`set_manopt_parameter!(::ProximalDCCost, ::Val{:p}, p)` -and `set_manopt_parameter!(::ProximalDCCost, ::Val{:λ}, λ)`, respectively. +`set_parameter!(::ProximalDCCost, ::Val{:p}, p)` +and `set_parameter!(::ProximalDCCost, ::Val{:λ}, λ)`, respectively. # Constructor @@ -333,11 +333,11 @@ mutable struct ProximalDCCost{P,TG,R} end (F::ProximalDCCost)(M, p) = 1 / (2 * F.λ) * distance(M, p, F.pk)^2 + F.g(M, p) -function set_manopt_parameter!(pdcc::ProximalDCCost, ::Val{:p}, p) +function set_parameter!(pdcc::ProximalDCCost, ::Val{:p}, p) pdcc.pk .= p return pdcc end -function set_manopt_parameter!(pdcc::ProximalDCCost, ::Val{:λ}, λ) +function set_parameter!(pdcc::ProximalDCCost, ::Val{:λ}, λ) pdcc.λ = λ return pdcc end @@ -367,8 +367,8 @@ for a point `pk` and a proximal parameter `λ`. * `λ` - the prox parameter Both interim values can be set using -`set_manopt_parameter!(::ProximalDCGrad, ::Val{:p}, p)` -and `set_manopt_parameter!(::ProximalDCGrad, ::Val{:λ}, λ)`, respectively. +`set_parameter!(::ProximalDCGrad, ::Val{:p}, p)` +and `set_parameter!(::ProximalDCGrad, ::Val{:λ}, λ)`, respectively. # Constructor @@ -405,11 +405,11 @@ function (grad_f!::ProximalDCGrad{InplaceEvaluation})(M, p) X .-= 1 / grad_f!.λ * log(M, p, grad_f!.pk) return X end -function set_manopt_parameter!(pdcg::ProximalDCGrad, ::Val{:p}, p) +function set_parameter!(pdcg::ProximalDCGrad, ::Val{:p}, p) pdcg.pk .= p return pdcg end -function set_manopt_parameter!(pdcg::ProximalDCGrad, ::Val{:λ}, λ) +function set_parameter!(pdcg::ProximalDCGrad, ::Val{:λ}, λ) pdcg.λ = λ return pdcg end diff --git a/src/plans/docstring_snippets.jl b/src/plans/docstring_snippets.jl deleted file mode 100644 index 7a5cc829bc..0000000000 --- a/src/plans/docstring_snippets.jl +++ /dev/null @@ -1,167 +0,0 @@ -# -# -# This file collects a few strings to be reused in documentation to avoid retyping everything - -# LateX symbols -_l_cal(letter::String) = raw"\mathcal " * "$letter" -_l_cO = raw"\mathcal O" -_l_ds = raw"\displaystyle" -_l_argmin = raw"\operatorname{arg\,min}" -_l_frac(a, b) = raw"\frac" * "{$a}{$b}" -_l_grad = raw"\operatorname{grad}" -_l_Hess = raw"\operatorname{Hess}" -_l_log = raw"\log" -_l_prox = raw"\operatorname{prox}" -_l_refl = raw"\operatorname{refl}_p(x) = \operatorname{retr}_p(-\operatorname{retr}^{-1}_p x)" -_l_subgrad = raw"∂" -_l_min = raw"\min" -_l_max = raw"\min" -_l_norm(v, i="") = raw"\lVert" * "$v" * raw"\rVert" * "_{$i}" -# Semantics -_l_Manifold(M="M") = _l_cal(M) -_l_M = "$(_l_Manifold())" -_l_TpM(p="p") = "T_{$p}$_l_M" -_l_DΛ = "DΛ: T_{m}$(_l_M) → T_{Λ(m)}$(_l_Manifold("N"))" -_l_grad_long = raw"\operatorname{grad} f: \mathcal M → T\mathcal M" -_l_Hess_long = "$_l_Hess f(p)[⋅]: $(_l_TpM()) → $(_l_TpM())" -_l_retr = raw"\operatorname{retr}" -_l_retr_long = raw"\operatorname{retr}: T\mathcal M \to \mathcal M" -_l_vt = raw"\mathcal T_{\cdot\gets\cdot}" -_l_C_subset_M = "$(_l_cal("C")) ⊂ $(_l_cal("M"))" -_l_txt(s) = "\\text{$s}" - -# Math terms -_math_VT = raw"a vector transport ``T``" -_math_inv_retr = "an inverse retraction ``$_l_retr^{-1}``" -_math_retr = " a retraction $_l_retr" -_math_reflect = raw""" -```math - \operatorname{refl}_p(x) = \operatorname{retr}_p(-\operatorname{retr}^{-1}_p x), -``` -where ``\operatorname{retr}`` and ``\operatorname{retr}^{-1}`` denote a retraction and an inverse -retraction, respectively. -""" -function _math_sequence(name, index, i_start=1, i_end="n") - return "\\{$(name)_{$index}\\}_{i=$(i_start)}^{$i_end}" -end - -_problem_default = raw""" -```math -\operatorname*{arg\,min}_{p ∈ \mathcal M} f(p) -``` -""" - -_problem_constrained = raw"""```math -\begin{aligned} -\min_{p ∈\mathcal{M}} &f(p)\\ -\text{subject to } &g_i(p)\leq 0 \quad \text{ for } i= 1, …, m,\\ -\quad &h_j(p)=0 \quad \text{ for } j=1,…,n, -\end{aligned} -``` -""" - -# Arguments of functions -_arg_alt_mgo = raw""" -Alternatively to `f` and `grad_f` you can provide -the [`AbstractManifoldGradientObjective`](@ref) `gradient_objective` directly. -""" - -# Arguments -_arg_f = raw"* `f`: a cost function ``f: \mathcal M→ℝ`` implemented as `(M, p) -> v`" -_arg_grad_f = raw""" -* `grad_f`: the gradient ``\operatorname{grad}f: \mathcal M → T\mathcal M`` of f - as a function `(M, p) -> X` or a function `(M, X, p) -> X` computing `X` in-place -""" -_arg_Hess_f = """ -* `Hess_f`: the Hessian ``$_l_Hess_long`` of f - as a function `(M, p, X) -> Y` or a function `(M, Y, p, X) -> Y` computing `Y` in-place -""" -_arg_p = raw"* `p` an initial value `p` ``= p^{(0)} ∈ \mathcal M``" -_arg_M = "* `M` a manifold ``$_l_M``" -_arg_inline_M = "the manifold `M`" -_arg_X = "* `X` a tangent vector" -_arg_sub_problem = "* `sub_problem` a [`AbstractManoptProblem`](@ref) to specify a problem for a solver or a closed form solution function." -_arg_sub_state = "* `sub_state` a [`AbstractManoptSolverState`](@ref) for the `sub_problem` or a [`AbstractEvaluationType`](@ref) if a closed form solution is provided." -_arg_subgrad_f = raw""" -* `∂f`: the subgradient ``∂f: \mathcal M → T\mathcal M`` of f - as a function `(M, p) -> X` or a function `(M, X, p) -> X` computing `X` in-place. - This function should always only return one element from the subgradient. -""" - -_doc_remark_tutorial_debug = "If you activate tutorial mode (cf. [`is_tutorial_mode`](@ref)), this solver provides additional debug warnings." -_doc_sec_output = """ -# Output - -The obtained approximate minimizer ``p^*``. -To obtain the whole final state of the solver, see [`get_solver_return`](@ref) for details, especially the `return_state=` keyword. -""" - -_sc_any = "[` | `](@ref StopWhenAny)" -_sc_all = "[` & `](@ref StopWhenAll)" - -# Fields -_field_at_iteration = "`at_iteration`: an integer indicating at which the stopping criterion last indicted to stop, which might also be before the solver started (`0`). Any negative value indicates that this was not yet the case; " -_field_iterate = "`p`: the current iterate ``p=p^{(k)} ∈ $(_l_M)``" -_field_gradient = "`X`: the current gradient ``$(_l_grad)f(p^{(k)}) ∈ T_p$(_l_M)``" -_field_subgradient = "`X` : the current subgradient ``$(_l_subgrad)f(p^{(k)}) ∈ T_p$_l_M``" -_field_inv_retr = "`inverse_retraction_method::`[`AbstractInverseRetractionMethod`](@extref `ManifoldsBase.AbstractInverseRetractionMethod`) : an inverse retraction ``$(_l_retr)^{-1}``" -_field_p = raw"`p`, an initial value `p` ``= p^{(0)} ∈ \mathcal M``" -_field_retr = "`retraction_method::`[`AbstractRetractionMethod`](@extref `ManifoldsBase.AbstractRetractionMethod`) : a retraction ``$(_l_retr_long)``" -_field_sub_problem = "`sub_problem::Union{`[`AbstractManoptProblem`](@ref)`, F}`: a manopt problem or a function for a closed form solution of the sub problem" -_field_sub_state = "`sub_state::Union{`[`AbstractManoptSolverState`](@ref)`,`[`AbstractEvaluationType`](@ref)`}`: for a sub problem state which solver to use, for the closed form solution function, indicate, whether the closed form solution function works with [`AllocatingEvaluation`](@ref)) `(M, p, X) -> q` or with an [`InplaceEvaluation`](@ref)) `(M, q, p, X) -> q`" -_field_stop = "`stop::`[`StoppingCriterion`](@ref) : a functor indicating when to stop and whether the algorithm has stopped" -_field_step = "`stepsize::`[`Stepsize`](@ref) : a stepsize." -_field_vector_transp = "`vector_transport_method::`[`AbstractVectorTransportMethod`](@extref `ManifoldsBase.AbstractVectorTransportMethod`) : a vector transport ``$_l_vt``" -_field_X = "`X`: a tangent vector" - -# Keywords -_kw_evaluation_default = "`evaluation=`[`AllocatingEvaluation`](@ref)`()`" -_kw_evaluation = "specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result ([`AllocatingEvaluation`](@ref)) or whether they modify their input argument to return the result therein ([`InplaceEvaluation`](@ref)). Since usually the first argument is the manifold, the modified argument is the second." -_kw_evaluation_example = "For example `grad_f(M,p)` allocates, but `grad_f!(M, X, p)` computes the result in-place of `X`." - -_kw_inverse_retraction_method_default = "`inverse_retraction_method=`[`default_inverse_retraction_method`](@extref `ManifoldsBase.default_inverse_retraction_method-Tuple{AbstractManifold}`)`(M, typeof(p))`" -_kw_inverse_retraction_method = "an inverse retraction ``$(_l_retr)^{-1}`` to use, see [the section on retractions and their inverses](@extref ManifoldsBase :doc:`retractions`)." - -_kw_others = raw""" -All other keyword arguments are passed to [`decorate_state!`](@ref) for state decorators or -[`decorate_objective!`](@ref) for objective, respectively. -""" - -_kw_retraction_method_default = raw"`retraction_method=`[`default_retraction_method`](@extref `ManifoldsBase.default_retraction_method-Tuple{AbstractManifold}`)`(M, typeof(p))`" -_kw_retraction_method = "a retraction ``$(_l_retr)`` to use, see [the section on retractions](@extref ManifoldsBase :doc:`retractions`)." - -_kw_stepsize = raw"a functor inheriting from [`Stepsize`](@ref) to determine a step size" - -_kw_stopping_criterion = raw"a functor inheriting from [`StoppingCriterion`](@ref) indicating when to stop." -_kw_stop_note = "is used to set the field `stop`." - -_kw_sub_kwargs_default = "`sub_kwargs=(;)`" -_kw_sub_kwargs = "a named tuple of keyword arguments that are passed to [`decorate_objective!`](@ref) of the sub solvers objective, the [`decorate_state!`](@ref) of the subsovlers state, and the sub state constructor itself." - -_kw_sub_objective = "a shortcut to modify the objective of the subproblem used within in the `sub_problem=` keyword" -function _kw_sub_objective_default_text(type::String) - return "By default, this is initialized as a [`$type`](@ref), which can further be decorated by using the `sub_kwargs=` keyword" -end - -_kw_vector_transport_method_default = "`vector_transport_method=`[`default_vector_transport_method`](@extref `ManifoldsBase.default_vector_transport_method-Tuple{AbstractManifold}`)`(M, typeof(p))`" -_kw_vector_transport_method = "a vector transport ``$_l_vt`` to use, see [the section on vector transports](@extref ManifoldsBase :doc:`vector_transports`)." - -_kw_X_default = raw"`X=`[`zero_vector`](@extref `ManifoldsBase.zero_vector-Tuple{AbstractManifold, Any}`)`(M,p)`" -_kw_X = raw"specify a memory internally to store a tangent vector" - -function _kw_used_in(s::String) - return "This is used to define the `$s=` keyword and has hence no effect, if you set `$s` directly." -end - -function _link_zero_vector(M="M", p="p") - arg = length(M) > 0 ? "`($M, $p)`" : "" - return "[`zero_vector`](@extref `ManifoldsBase.zero_vector-Tuple{AbstractManifold, Any}`)$arg" -end -function _link_manifold_dimension(M="M") - arg = length(M) > 0 ? "`($M)`" : "" - return "[`manifold_dimension`](@extref `ManifoldsBase.manifold_dimension-Tuple{AbstractManifold}`)$arg" -end -function _link_rand(M="M") - arg = length(M) > 0 ? "`($M)`" : "" - return "[`rand`](@extref Base.rand-Tuple{AbstractManifold})$arg" -end diff --git a/src/plans/exact_penalty_method_plan.jl b/src/plans/exact_penalty_method_plan.jl index 1a3133080b..d5cf0daa44 100644 --- a/src/plans/exact_penalty_method_plan.jl +++ b/src/plans/exact_penalty_method_plan.jl @@ -62,11 +62,11 @@ function ExactPenaltyCost( ) where {R} return ExactPenaltyCost{typeof(smoothing),typeof(co),R}(co, ρ, u) end -function set_manopt_parameter!(epc::ExactPenaltyCost, ::Val{:ρ}, ρ) +function set_parameter!(epc::ExactPenaltyCost, ::Val{:ρ}, ρ) epc.ρ = ρ return epc end -function set_manopt_parameter!(epc::ExactPenaltyCost, ::Val{:u}, u) +function set_parameter!(epc::ExactPenaltyCost, ::Val{:u}, u) epc.u = u return epc end @@ -115,11 +115,11 @@ mutable struct ExactPenaltyGrad{S,CO,R} ρ::R u::R end -function set_manopt_parameter!(epg::ExactPenaltyGrad, ::Val{:ρ}, ρ) +function set_parameter!(epg::ExactPenaltyGrad, ::Val{:ρ}, ρ) epg.ρ = ρ return epg end -function set_manopt_parameter!(epg::ExactPenaltyGrad, ::Val{:u}, u) +function set_parameter!(epg::ExactPenaltyGrad, ::Val{:u}, u) epg.u = u return epg end diff --git a/src/plans/gradient_plan.jl b/src/plans/gradient_plan.jl index d4ad0163ea..f31cc0665d 100644 --- a/src/plans/gradient_plan.jl +++ b/src/plans/gradient_plan.jl @@ -221,13 +221,6 @@ function get_subgradient!( return get_gradient!(M, X, agmo, p) end -function _to_mutating_gradient(grad_f, evaluation::AllocatingEvaluation) - return grad_f_(M, p) = [grad_f(M, p[])] -end -function _to_mutating_gradient(grad_f, evaluation::InplaceEvaluation) - return grad_f_(M, X, p) = (X .= [grad_f(M, p[])]) -end - @doc raw""" get_gradient(agst::AbstractGradientSolverState) @@ -280,62 +273,70 @@ abstract type DirectionUpdateRule end IdentityUpdateRule <: DirectionUpdateRule The default gradient direction update is the identity, usually it just evaluates the gradient. + +You can also use `Gradient()` to create the corresponding factory, though this only delays +this parameter-free instantiation to later. """ struct IdentityUpdateRule <: DirectionUpdateRule end +Gradient() = ManifoldDefaultsFactory(Manopt.IdentityUpdateRule; requires_manifold=false) """ - MomentumGradient <: DirectionUpdateRule + MomentumGradientRule <: DirectionUpdateRule -Append a momentum to a gradient processor, where the last direction and last iterate are -stored and the new is composed as ``η_i = m*η_{i-1}' - s d_i``, -where ``sd_i`` is the current (inner) direction and ``η_{i-1}'`` is the vector transported -last direction multiplied by momentum ``m``. +Store the necessary information to compute the [`MomentumGradient`](@ref) +direction update. # Fields -* `p_old`: remember the last iterate for parallel transporting the last direction -* `momentum`: factor for momentum +$(_var(:Field, :p, "p_old")) +* `momentum::Real`: factor for the momentum * `direction`: internal [`DirectionUpdateRule`](@ref) to determine directions to add the momentum to. -* `vector_transport_method`: vector transport method to use -* `X_old`: the last gradient/direction update added as momentum +$(_var(:Field, :vector_transport_method)) +$(_var(:Field, :X, "X_old")) # Constructors -Add momentum to a gradient problem, where by default just a gradient evaluation is used - MomentumGradient( - M::AbstractManifold; - p=rand(M), - s::DirectionUpdateRule=IdentityUpdateRule(); - X=zero_vector(M, p), - momentum=0.2 - vector_transport_method=default_vector_transport_method(M, typeof(p)), - ) + MomentumGradientRule(M::AbstractManifold; kwargs...) Initialize a momentum gradient rule to `s`, where `p` and `X` are memory for interim values. + +## Keyword arguments + +$(_var(:Keyword, :p)) +* `s=`[`IdentityUpdateRule`](@ref)`()` +* `momentum=0.2` +$(_var(:Keyword, :vector_transport_method)) +$(_var(:Keyword, :X)) + + +# See also +[`MomentumGradient`](@ref) """ -mutable struct MomentumGradient{P,T,R<:Real,VTM<:AbstractVectorTransportMethod} <: - DirectionUpdateRule +mutable struct MomentumGradientRule{ + P,T,D<:DirectionUpdateRule,R<:Real,VTM<:AbstractVectorTransportMethod +} <: DirectionUpdateRule momentum::R p_old::P - direction::DirectionUpdateRule + direction::D vector_transport_method::VTM X_old::T end -function MomentumGradient( - M::AbstractManifold, - p::P=rand(M); - direction::DirectionUpdateRule=IdentityUpdateRule(), +function MomentumGradientRule( + M::AbstractManifold; + p::P=rand(M), + direction::Union{<:DirectionUpdateRule,ManifoldDefaultsFactory}=Gradient(), vector_transport_method::VTM=default_vector_transport_method(M, typeof(p)), - X=zero_vector(M, p), - momentum=0.2, -) where {P,VTM<:AbstractVectorTransportMethod} - return MomentumGradient{P,typeof(X),typeof(momentum),VTM}( - momentum, p, direction, vector_transport_method, X + X::Q=zero_vector(M, p), + momentum::F=0.2, +) where {P,Q,F<:Real,VTM<:AbstractVectorTransportMethod} + dir = _produce_type(direction, M) + return MomentumGradientRule{P,Q,typeof(dir),F,VTM}( + momentum, p, dir, vector_transport_method, X ) end -function (mg::MomentumGradient)( +function (mg::MomentumGradientRule)( mp::AbstractManoptProblem, s::AbstractGradientSolverState, k ) M = get_manifold(mp) @@ -350,24 +351,52 @@ function (mg::MomentumGradient)( end """ - AverageGradient <: DirectionUpdateRule + MomentumGradient() + +Append a momentum to a gradient processor, where the last direction and last iterate are +stored and the new is composed as ``η_i = m*η_{i-1}' - s d_i``, +where ``sd_i`` is the current (inner) direction and ``η_{i-1}'`` is the vector transported +last direction multiplied by momentum ``m``. + +# Input + +* `M` (optional) + +# Keyword arguments + +$(_var(:Keyword, :p)) +* `direction=`[`IdentityUpdateRule`](@ref) preprocess the actual gradient before adding momentum +$(_var(:Keyword, :X)) +* `momentum=0.2` amount of momentum to use +$(_var(:Keyword, :vector_transport_method)) + +$(_note(:ManifoldDefaultFactory, "MomentumGradientRule")) +""" +MomentumGradient(args...; kwargs...) = + ManifoldDefaultsFactory(Manopt.MomentumGradientRule, args...; kwargs...) + +""" + AverageGradientRule <: DirectionUpdateRule Add an average of gradients to a gradient processor. A set of previous directions (from the -inner processor) and the last iterate are stored, average is taken after vector transporting +inner processor) and the last iterate are stored. The average is taken after vector transporting them to the current iterates tangent space. + # Fields + * `gradients`: the last `n` gradient/direction updates * `last_iterate`: last iterate (needed to transport the gradients) * `direction`: internal [`DirectionUpdateRule`](@ref) to determine directions to apply the averaging to -* `vector_transport_method`: vector transport method to use +$(_var(:Keyword, :vector_transport_method)) # Constructors - AverageGradient( - M::AbstractManifold, + + AverageGradientRule( + M::AbstractManifold; p::P=rand(M); n::Int=10 - s::DirectionUpdateRule=IdentityUpdateRule(); + direction::Union{<:DirectionUpdateRule,ManifoldDefaultsFactory}=IdentityUpdateRule(), gradients = fill(zero_vector(p.M, o.x),n), last_iterate = deepcopy(x0), vector_transport_method = default_vector_transport_method(M, typeof(p)) @@ -376,104 +405,126 @@ them to the current iterates tangent space. Add average to a gradient problem, where * `n`: determines the size of averaging -* `s`: is the internal [`DirectionUpdateRule`](@ref) to determine the gradients to store +* `direction`: is the internal [`DirectionUpdateRule`](@ref) to determine the gradients to store * `gradients`: can be pre-filled with some history * `last_iterate`: stores the last iterate -* `vector_transport_method`: determines how to transport all gradients to the current iterates tangent space before averaging +$(_var(:Keyword, :vector_transport_method)) """ -mutable struct AverageGradient{P,T,VTM<:AbstractVectorTransportMethod} <: - DirectionUpdateRule +mutable struct AverageGradientRule{ + P,T,D<:DirectionUpdateRule,VTM<:AbstractVectorTransportMethod +} <: DirectionUpdateRule gradients::AbstractVector{T} last_iterate::P - direction::DirectionUpdateRule + direction::D vector_transport_method::VTM end -function AverageGradient( - M::AbstractManifold, - p::P=rand(M); +function AverageGradientRule( + M::AbstractManifold; + p::P=rand(M), n::Int=10, - direction::DirectionUpdateRule=IdentityUpdateRule(), + direction::Union{<:DirectionUpdateRule,ManifoldDefaultsFactory}=Gradient(), gradients=[zero_vector(M, p) for _ in 1:n], vector_transport_method::VTM=default_vector_transport_method(M, typeof(p)), ) where {P,VTM} - return AverageGradient{P,eltype(gradients),VTM}( - gradients, p, direction, vector_transport_method + dir = _produce_type(direction, M) + return AverageGradientRule{P,eltype(gradients),typeof(dir),VTM}( + gradients, copy(M, p), dir, vector_transport_method ) end -function (a::AverageGradient)(mp::AbstractManoptProblem, s::AbstractGradientSolverState, k) +function (a::AverageGradientRule)( + mp::AbstractManoptProblem, s::AbstractGradientSolverState, k +) + # remove oldest/last pop!(a.gradients) M = get_manifold(mp) - step, d = a.direction(mp, s, k) #get inner gradient and step - a.gradients = vcat([deepcopy(d)], a.gradients) - for i in 1:(length(a.gradients) - 1) #transport & shift in place - vector_transport_to!( - M, - a.gradients[i], - a.last_iterate, - a.gradients[i + 1], - get_iterate(s), - a.vector_transport_method, - ) + p = get_iterate(s) + _, d = a.direction(mp, s, k) #get inner gradient and step + for g in a.gradients + vector_transport_to!(M, g, a.last_iterate, g, p, a.vector_transport_method) end - a.gradients[1] = deepcopy(d) - copyto!(M, a.last_iterate, get_iterate(s)) - return step, 1 / length(a.gradients) .* sum(a.gradients) + pushfirst!(a.gradients, copy(M, p, d)) + copyto!(M, a.last_iterate, p) + return 1.0, 1 / length(a.gradients) .* sum(a.gradients) end -@doc raw""" - Nesterov <: DirectionUpdateRule +""" + AverageGradient(; kwargs...) + AverageGradient(M::AbstractManifold; kwargs...) -## Fields -* `γ` -* `μ` the strong convexity coefficient -* `v` (=``=v_k``, ``v_0=x_0``) an interim point to compute the next gradient evaluation point `y_k` -* `shrinkage` (`= i -> 0.8`) a function to compute the shrinkage ``β_k`` per iterate. +Add an average of gradients to a gradient processor. A set of previous directions (from the +inner processor) and the last iterate are stored, average is taken after vector transporting +them to the current iterates tangent space. -Assume ``f`` is ``L``-Lipschitz and ``μ``-strongly convex. Given +# Input -* a step size ``h_k<\frac{1}{L}`` (from the [`GradientDescentState`](@ref) -* a `shrinkage` parameter ``β_k`` -* and a current iterate ``x_k`` -* as well as the interim values ``γ_k`` and ``v_k`` from the previous iterate. +* `M` (optional) -This compute a Nesterov type update using the following steps, see [ZhangSra:2018](@cite) +# Keyword arguments + +$(_var(:Keyword, :p; add=:as_Initial)) +* `direction=`[`IdentityUpdateRule`](@ref) preprocess the actual gradient before adding momentum +* `gradients=[zero_vector(M, p) for _ in 1:n]` how to initialise the internal storage +* `n=10` number of gradient evaluations to take the mean over +$(_var(:Keyword, :X)) +$(_var(:Keyword, :vector_transport_method)) -1. Compute the positive root ``α_k∈(0,1)`` of ``α^2 = h_k\bigl((1-α_k)γ_k+α_k μ\bigr)``. -2. Set ``\bar γ_k+1 = (1-α_k)γ_k + α_kμ`` -3. ``y_k = \operatorname{retr}_{x_k}\Bigl(\frac{α_kγ_k}{γ_k + α_kμ}\operatorname{retr}^{-1}_{x_k}v_k \Bigr)`` -4. ``x_{k+1} = \operatorname{retr}_{y_k}(-h_k \operatorname{grad}f(y_k))`` -5. ``v_{k+1} = `\operatorname{retr}_{y_k}\Bigl(\frac{(1-α_k)γ_k}{\barγ_k}\operatorname{retr}_{y_k}^{-1}(v_k) - \frac{α_k}{\bar γ_{k+1}}\operatorname{grad}f(y_k) \Bigr)`` -6. ``γ_{k+1} = \frac{1}{1+β_k}\bar γ_{k+1}`` +$(_note(:ManifoldDefaultFactory, "AverageGradientRule")) +""" +AverageGradient(args...; kwargs...) = + ManifoldDefaultsFactory(Manopt.AverageGradientRule, args...; kwargs...) + +@doc """ + NesterovRule <: DirectionUpdateRule + +Compute a Nesterov inspired direction update rule. +See [`Nesterov`](@ref) for details + +# Fields -Then the direction from ``x_k`` to ``x_k+1`` by ``d = \operatorname{retr}^{-1}_{x_k}x_{k+1}`` is returned. +* `γ::Real`, `μ::Real`: coefficients from the last iterate +* `v::P`: an interim point to compute the next gradient evaluation point `y_k` +* `shrinkage`: a function `k -> ...` to compute the shrinkage ``β_k`` per iterate `k``. +$(_var(:Keyword, :inverse_retraction_method)) # Constructor - Nesterov(M::AbstractManifold, p::P; γ=0.001, μ=0.9, shrinkage = k -> 0.8; - inverse_retraction_method=LogarithmicInverseRetraction()) -Initialize the Nesterov acceleration, where `x0` initializes `v`. + NesterovRule(M::AbstractManifold; kwargs...) + +## Keyword arguments + +$(_var(:Keyword, :p; add=:as_Initial)) +* `γ=0.001`` +* `μ=0.9`` +* `shrinkage = k -> 0.8` +$(_var(:Keyword, :inverse_retraction_method)) + +# See also + +[`Nesterov`](@ref) """ -mutable struct Nesterov{P,R<:Real} <: DirectionUpdateRule +mutable struct NesterovRule{P,R<:Real} <: DirectionUpdateRule γ::R μ::R v::P shrinkage::Function inverse_retraction_method::AbstractInverseRetractionMethod end -Nesterov(M::AbstractManifold, p::Number; kwargs...) = Nesterov(M, [p]; kwargs...) -function Nesterov( - M::AbstractManifold, - p::P; +function NesterovRule( + M::AbstractManifold; + p::P=rand(M), γ::T=0.001, μ::T=0.9, shrinkage::Function=i -> 0.8, inverse_retraction_method::AbstractInverseRetractionMethod=default_inverse_retraction_method( - M, P + M, typeof(p) ), ) where {P,T} - return Nesterov{P,T}(γ, μ, copy(M, p), shrinkage, inverse_retraction_method) + p_ = _ensure_mutating_variable(p) + return NesterovRule{typeof(p_),T}( + γ, μ, copy(M, p_), shrinkage, inverse_retraction_method + ) end -function (n::Nesterov)(mp::AbstractManoptProblem, s::AbstractGradientSolverState, k) +function (n::NesterovRule)(mp::AbstractManoptProblem, s::AbstractGradientSolverState, k) M = get_manifold(mp) h = get_stepsize(mp, s, k) p = get_iterate(s) @@ -495,6 +546,46 @@ function (n::Nesterov)(mp::AbstractManoptProblem, s::AbstractGradientSolverState return h, (-1 / h) * inverse_retract(M, p, xn, n.inverse_retraction_method) # outer update end +@doc """ + Nesterov(; kwargs...) + Nesterov(M::AbstractManifold; kwargs...) + +Assume ``f`` is ``L``-Lipschitz and ``μ``-strongly convex. Given + +* a step size ``h_k<$(_tex(:frac, "1", "L"))`` (from the [`GradientDescentState`](@ref) +* a `shrinkage` parameter ``β_k`` +* and a current iterate ``p_k`` +* as well as the interim values ``γ_k`` and ``v_k`` from the previous iterate. + +This compute a Nesterov type update using the following steps, see [ZhangSra:2018](@cite) + +1. Compute the positive root ``α_k∈(0,1)`` of ``α^2 = h_k$(_tex(:bigl))((1-α_k)γ_k+α_k μ$(_tex(:bigr)))``. +2. Set ``$(_tex(:bar, "γ"))_k+1 = (1-α_k)γ_k + α_kμ`` +3. ``y_k = $(_tex(:retr))_{p_k}\\Bigl(\\frac{α_kγ_k}{γ_k + α_kμ}$(_tex(:retr))^{-1}_{p_k}v_k \\Bigr)`` +4. ``x_{k+1} = $(_tex(:retr))_{y_k}(-h_k $(_tex(:grad))f(y_k))`` +5. ``v_{k+1} = $(_tex(:retr))_{y_k}\\Bigl(\\frac{(1-α_k)γ_k}{$(_tex(:bar, "γ"))_k}$(_tex(:retr))_{y_k}^{-1}(v_k) - \\frac{α_k}{$(_tex(:bar, "γ"))_{k+1}}$(_tex(:grad))f(y_k) \\Bigr)`` +6. ``γ_{k+1} = \\frac{1}{1+β_k}$(_tex(:bar, "γ"))_{k+1}`` + +Then the direction from ``p_k`` to ``p_k+1`` by ``d = $(_tex(:invretr))_{p_k}p_{k+1}`` is returned. + +# Input + +* `M` (optional) + +# Keyword arguments + +$(_var(:Keyword, :p; add=:as_Initial)) +* `γ=0.001`` +* `μ=0.9`` +* `shrinkage = k -> 0.8` +$(_var(:Keyword, :inverse_retraction_method)) + +$(_note(:ManifoldDefaultFactory, "NesterovRule")) +""" +function Nesterov(args...; kwargs...) + return ManifoldDefaultsFactory(Manopt.NesterovRule, args...; kwargs...) +end + @doc raw""" DebugGradient <: DebugAction diff --git a/src/plans/hessian_plan.jl b/src/plans/hessian_plan.jl index 1cd1861cba..0a2ab47fb0 100644 --- a/src/plans/hessian_plan.jl +++ b/src/plans/hessian_plan.jl @@ -216,12 +216,12 @@ _doc_ApproxHessian_step = raw"\operatorname{retr}_p(\frac{c}{\lVert X \rVert_p}X A functor to approximate the Hessian by a finite difference of gradient evaluation. -Given a point `p` and a direction `X` and the gradient ``$(_l_grad) f(p)`` +Given a point `p` and a direction `X` and the gradient ``$(_tex(:grad)) f(p)`` of a function ``f`` the Hessian is approximated as follows: -let ``c`` be a stepsize, ``X ∈ $(_l_TpM())`` a tangent vector and ``q = $_doc_ApproxHessian_step`` +let ``c`` be a stepsize, ``X ∈ $(_math(:TpM))`` a tangent vector and ``q = $_doc_ApproxHessian_step`` be a step in direction ``X`` of length ``c`` following a retraction Then the Hessian is approximated by the finite difference of the gradients, -where ``$_l_vt`` is a vector transport. +where ``$(_math(:vector_transport, :symbol))`` is a vector transport. $_doc_ApproxHessian_formula @@ -229,8 +229,8 @@ $_doc_ApproxHessian_formula * `gradient!!`: the gradient function (either allocating or mutating, see `evaluation` parameter) * `step_length`: a step length for the finite difference -* `retraction_method`: a retraction to use -* `vector_transport_method`: a vector transport to use +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :vector_transport_method)) ## Internal temporary fields @@ -244,12 +244,10 @@ $_doc_ApproxHessian_formula ## Keyword arguments -* `evaluation=`[`AllocatingEvaluation`](@ref)) whether the gradient is given as an allocation function or an in-place ([`InplaceEvaluation`](@ref)). +$(_var(:Keyword, :evaluation)) * `steplength=`2^{-14}``: step length ``c`` to approximate the gradient evaluations -* $_kw_retraction_method_default - $_kw_retraction_method -* $_kw_vector_transport_method_default - $_kw_vector_transport_method +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :vector_transport_method)) """ mutable struct ApproxHessianFiniteDifference{E,P,T,G,RTR,VTR,R<:Real} <: AbstractApproxHessian @@ -311,7 +309,7 @@ function (f::ApproxHessianFiniteDifference{InplaceEvaluation})(M, Y, p, X) return Y end -@doc raw""" +@doc """ ApproxHessianSymmetricRankOne{E, P, G, T, B<:AbstractBasis{ℝ}, VTR, R<:Real} <: AbstractApproxHessian A functor to approximate the Hessian by the symmetric rank one update. @@ -320,7 +318,7 @@ A functor to approximate the Hessian by the symmetric rank one update. * `gradient!!`: the gradient function (either allocating or mutating, see `evaluation` parameter). * `ν`: a small real number to ensure that the denominator in the update does not become too small and thus the method does not break down. -* `vector_transport_method`: a vector transport to use. +$(_var(:Keyword, :vector_transport_method)). ## Internal temporary fields @@ -338,8 +336,8 @@ A functor to approximate the Hessian by the symmetric rank one update. * `initial_operator` (`Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M))`) the matrix representation of the initial approximating operator. * `basis` (`DefaultOrthonormalBasis()`) an orthonormal basis in the tangent space of the initial iterate p. * `nu` (`-1`) -* `evaluation` ([`AllocatingEvaluation`](@ref)) whether the gradient is given as an allocation function or an in-place ([`InplaceEvaluation`](@ref)). -* `vector_transport_method` (`ParallelTransport()`) vector transport ``\mathcal T_{\cdot\gets\cdot}`` to use. +$(_var(:Keyword, :evaluation)) +$(_var(:Keyword, :vector_transport_method)) """ mutable struct ApproxHessianSymmetricRankOne{E,P,G,T,B<:AbstractBasis{ℝ},VTR,R<:Real} <: AbstractApproxHessian @@ -361,9 +359,9 @@ function ApproxHessianSymmetricRankOne( basis::B=DefaultOrthonormalBasis(), nu::R=-1.0, evaluation=AllocatingEvaluation(), - vector_transport_method::VTR=ParallelTransport(), + vector_transport_method::VTM=default_vector_transport_method(M, typeof(p)), ) where { - mT<:AbstractManifold,P,G,B<:AbstractBasis{ℝ},R<:Real,VTR<:AbstractVectorTransportMethod + mT<:AbstractManifold,P,G,B<:AbstractBasis{ℝ},R<:Real,VTM<:AbstractVectorTransportMethod } if evaluation isa AllocatingEvaluation grad_tmp = gradient(M, p) @@ -372,7 +370,7 @@ function ApproxHessianSymmetricRankOne( gradient(M, grad_tmp, p) end - return ApproxHessianSymmetricRankOne{typeof(evaluation),P,G,typeof(grad_tmp),B,VTR,R}( + return ApproxHessianSymmetricRankOne{typeof(evaluation),P,G,typeof(grad_tmp),B,VTM,R}( p, gradient, grad_tmp, initial_operator, basis, vector_transport_method, nu ) end @@ -458,26 +456,33 @@ function update_hessian_basis!(M, f::ApproxHessianSymmetricRankOne{InplaceEvalua return f.gradient!!(M, f.grad_tmp, f.p_tmp) end -@doc raw""" +@doc """ ApproxHessianBFGS{E, P, G, T, B<:AbstractBasis{ℝ}, VTR, R<:Real} <: AbstractApproxHessian A functor to approximate the Hessian by the BFGS update. + # Fields + * `gradient!!` the gradient function (either allocating or mutating, see `evaluation` parameter). * `scale` -* `vector_transport_method` a vector transport to use. +$(_var(:Field, :vector_transport_method)) + ## Internal temporary fields + * `p_tmp` a temporary storage the current point `p`. * `grad_tmp` a temporary storage for the gradient at the current `p`. * `matrix` a temporary storage for the matrix representation of the approximating operator. * `basis` a temporary storage for an orthonormal basis at the current `p`. + # Constructor ApproxHessianBFGS(M, p, gradF; kwargs...) + ## Keyword arguments + * `initial_operator` (`Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M))`) the matrix representation of the initial approximating operator. * `basis` (`DefaultOrthonormalBasis()`) an orthonormal basis in the tangent space of the initial iterate p. * `nu` (`-1`) -* `evaluation` ([`AllocatingEvaluation`](@ref)) whether the gradient is given as an allocation function or an in-place ([`InplaceEvaluation`](@ref)). -* `vector_transport_method` (`ParallelTransport()`) vector transport ``\mathcal T_{\cdot\gets\cdot}`` to use. +$(_var(:Keyword, :evaluation)) +$(_var(:Keyword, :vector_transport_method)) """ mutable struct ApproxHessianBFGS{ E,P,G,T,B<:AbstractBasis{ℝ},VTR<:AbstractVectorTransportMethod @@ -500,15 +505,15 @@ function ApproxHessianBFGS( basis::B=DefaultOrthonormalBasis(), scale::Bool=true, evaluation=AllocatingEvaluation(), - vector_transport_method::VTR=ParallelTransport(), -) where {mT<:AbstractManifold,P,G,B<:AbstractBasis{ℝ},VTR<:AbstractVectorTransportMethod} + vector_transport_method::VTM=default_vector_transport_method(M, typeof(p)), +) where {mT<:AbstractManifold,P,G,B<:AbstractBasis{ℝ},VTM<:AbstractVectorTransportMethod} if evaluation == AllocatingEvaluation() grad_tmp = gradient(M, p) elseif evaluation == InplaceEvaluation() grad_tmp = zero_vector(M, p) gradient(M, grad_tmp, p) end - return ApproxHessianBFGS{typeof(evaluation),P,G,typeof(grad_tmp),B,VTR}( + return ApproxHessianBFGS{typeof(evaluation),P,G,typeof(grad_tmp),B,VTM}( p, gradient, grad_tmp, initial_operator, basis, vector_transport_method, scale ) end diff --git a/src/plans/higher_order_primal_dual_plan.jl b/src/plans/higher_order_primal_dual_plan.jl index 2acceac7e0..8068fdd840 100644 --- a/src/plans/higher_order_primal_dual_plan.jl +++ b/src/plans/higher_order_primal_dual_plan.jl @@ -1,4 +1,4 @@ -@doc raw""" +@doc """ PrimalDualManifoldSemismoothNewtonObjective{E<:AbstractEvaluationType, TC, LO, ALO, PF, DPF, PG, DPG, L} <: AbstractPrimalDualManifoldObjective{E, TC, PF} Describes a Problem for the Primal-dual Riemannian semismooth Newton algorithm. [DiepeveenLellmann:2021](@cite) @@ -7,11 +7,11 @@ Describes a Problem for the Primal-dual Riemannian semismooth Newton algorithm. * `cost`: ``F + G(Λ(⋅))`` to evaluate interim cost function values * `linearized_operator`: the linearization ``DΛ(⋅)[⋅]`` of the operator ``Λ(⋅)``. -* `linearized_adjoint_operator`: the adjoint differential ``(DΛ)^* : \mathcal N → T\mathcal M`` +* `linearized_adjoint_operator`: the adjoint differential ``(DΛ)^* : $(_math(:M; M="N")) → $(_math(:TM))`` * `prox_F`: the proximal map belonging to ``F`` * `diff_prox_F`: the (Clarke Generalized) differential of the proximal maps of ``F`` -* `prox_G_dual`: the proximal map belonging to ``g_n^*`` -* `diff_prox_dual_G`: the (Clarke Generalized) differential of the proximal maps of ``G^\ast_n`` +* `prox_G_dual`: the proximal map belonging to `G^$(_tex(:ast))_n`` +* `diff_prox_dual_G`: the (Clarke Generalized) differential of the proximal maps of ``G^$(_tex(:ast))_n`` * `Λ`: the exact forward operator. This operator is required if `Λ(m)=n` does not hold. # Constructor @@ -68,19 +68,19 @@ end # Fields -* `m`: base point on ``$_l_M`` -* `n`: base point on ``$(_l_Manifold("N"))`` -* `x`: an initial point on ``x^{(0)} ∈ $_l_M`` (and its previous iterate) -* `ξ`: an initial tangent vector ``\\xi^{(0)} ∈ T_{n}^*$(_l_Manifold("N"))`` (and its previous iterate) +$(_var(:Field, :p, "m")) +$(_var(:Field, :p, "n", "Q"; M="N")) +$(_var(:Field, :p; add=[:as_Iterate])) +$(_var(:Field, :X)) * `primal_stepsize::Float64`: proximal parameter of the primal prox * `dual_stepsize::Float64`: proximal parameter of the dual prox * `reg_param::Float64`: regularisation parameter for the Newton matrix -* `stop::`[`StoppingCriterion`](@ref): a [`StoppingCriterion`](@ref) +$(_var(:Field, :stopping_criterion, "stop")) * `update_primal_base`: function to update the primal base * `update_dual_base`: function to update the dual base -* $_field_retr -* $_field_inv_retr -* $_field_vector_transp +$(_var(:Field, :inverse_retraction_method)) +$(_var(:Field, :retraction_method)) +$(_var(:Field, :vector_transport_method)) where for the update functions a [`AbstractManoptProblem`](@ref) `amp`, [`AbstractManoptSolverState`](@ref) `ams` and the current iterate `i` are the arguments. @@ -89,26 +89,25 @@ If you activate these to be different from the default identity, you have to pro # Constructor - PrimalDualSemismoothNewtonState(M::AbstractManifold, m::P, n::Q, x::P, ξ::T; kwargs...) + PrimalDualSemismoothNewtonState(M::AbstractManifold; kwargs...) Generate a state for the [`primal_dual_semismooth_Newton`](@ref). ## Keyword arguments +* `m=`$(Manopt._link(:rand)) +* `n=`$(Manopt._link(:rand; M="N")) +* `p=`$(Manopt._link(:rand)) +* `X=`$(Manopt._link(:zero_vector)) * `primal_stepsize=1/sqrt(8)` * `dual_stepsize=1/sqrt(8)` * `reg_param=1e-5` * `update_primal_base=(amp, ams, k) -> o.m` * `update_dual_base=(amp, ams, k) -> o.n` -* $_kw_retraction_method_default: - $_kw_retraction_method -* $_kw_inverse_retraction_method_default: - $_kw_inverse_retraction_method -* `stopping_criterion=`[`StopAfterIteration`](@ref)(50): - $_kw_stopping_criterion -* $_kw_vector_transport_method_default: - $_kw_vector_transport_method - +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :inverse_retraction_method)) +$(_var(:Keyword, :stopping_criterion; default="`[`StopAfterIteration`](@ref)`(50)`")) +$(_var(:Keyword, :vector_transport_method)) """ mutable struct PrimalDualSemismoothNewtonState{ P, @@ -133,11 +132,11 @@ mutable struct PrimalDualSemismoothNewtonState{ vector_transport_method::VTM function PrimalDualSemismoothNewtonState( - M::AbstractManifold, - m::P, - n::Q, - p::P, - X::T; + M::AbstractManifold; + m::P=rand(M), + n::Q=rand(N), + p::P=rand(M), + X::T=zero_vector(M, p), primal_stepsize::Float64=1 / sqrt(8), dual_stepsize::Float64=1 / sqrt(8), regularization_parameter::Float64=1e-5, @@ -198,14 +197,14 @@ function set_iterate!(pdsn::PrimalDualSemismoothNewtonState, p) pdsn.p = p return pdsn end -@doc raw""" +@doc """ y = get_differential_primal_prox(M::AbstractManifold, pdsno::PrimalDualManifoldSemismoothNewtonObjective σ, x) get_differential_primal_prox!(p::TwoManifoldProblem, y, σ, x) Evaluate the differential proximal map of ``F`` stored within [`AbstractPrimalDualManifoldObjective`](@ref) ```math -D\operatorname{prox}_{σF}(x)[X] +D$(_tex(:prox))_{σF}(x)[X] ``` which can also be computed in place of `y`. @@ -279,14 +278,14 @@ function get_differential_primal_prox!( return get_differential_primal_prox!(M, Y, get_objective(admo, false), σ, p, X) end -@doc raw""" +@doc """ η = get_differential_dual_prox(N::AbstractManifold, pdsno::PrimalDualManifoldSemismoothNewtonObjective, n, τ, X, ξ) get_differential_dual_prox!(N::AbstractManifold, pdsno::PrimalDualManifoldSemismoothNewtonObjective, η, n, τ, X, ξ) Evaluate the differential proximal map of ``G_n^*`` stored within [`PrimalDualManifoldSemismoothNewtonObjective`](@ref) ```math -D\operatorname{prox}_{τG_n^*}(X)[ξ] +D$(_tex(:prox))_{τG_n^*}(X)[ξ] ``` which can also be computed in place of `η`. diff --git a/src/plans/interior_point_Newton_plan.jl b/src/plans/interior_point_Newton_plan.jl index 4b70f04aeb..9673a7f706 100644 --- a/src/plans/interior_point_Newton_plan.jl +++ b/src/plans/interior_point_Newton_plan.jl @@ -12,6 +12,7 @@ if these are different from the iterate and search direction of the main solver. # Constructor StepsizeState(p,X) + StepsizeState(M::AbstractManifold; p=rand(M), x=zero_vector(M,p) # See also @@ -21,31 +22,32 @@ struct StepsizeState{P,T} <: AbstractManoptSolverState p::P X::T end +StepsizeState(M::AbstractManifold; p=rand(M), X=zero_vector(M, p)) = StepsizeState(p, X) get_iterate(s::StepsizeState) = s.p get_gradient(s::StepsizeState) = s.X set_iterate!(s::StepsizeState, M, p) = copyto!(M, s.p, p) set_gradient!(s::StepsizeState, M, p, X) = copyto!(M, s.X, p, X) @doc """ - InteriorPointNewtonState <: AbstractHessianSolverState + InteriorPointNewtonState{P,T} <: AbstractHessianSolverState # Fields * `λ`: the Lagrange multiplier with respect to the equality constraints * `μ`: the Lagrange multiplier with respect to the inequality constraints -* `p`: the current iterate +$(_var(:Field, :p; add=[:as_Iterate])) * `s`: the current slack variable -* `sub_problem`: an [`AbstractManoptProblem`](@ref) problem for the subsolver -* `sub_state`: an [`AbstractManoptSolverState`](@ref) for the subsolver +$(_var(:Field, :sub_problem)) +$(_var(:Field, :sub_state)) * `X`: the current gradient with respect to `p` * `Y`: the current gradient with respect to `μ` * `Z`: the current gradient with respect to `λ` * `W`: the current gradient with respect to `s` * `ρ`: store the orthogonality `μ's/m` to compute the barrier parameter `β` in the sub problem * `σ`: scaling factor for the barrier parameter `β` in the sub problem -* `stop`: a [`StoppingCriterion`](@ref) indicating when to stop. -* `retraction_method`: the retraction method to use on `M`. -* `stepsize::`[`Stepsize`](@ref): the stepsize to use +$(_var(:Field, :stopping_criterion, "stop")) +$(_var(:Field, :retraction_method)) +$(_var(:Field, :stepsize)) * `step_problem`: an [`AbstractManoptProblem`](@ref) storing the manifold and objective for the line search * `step_state`: storing iterate and search direction in a state for the line search, see [`StepsizeState`](@ref) @@ -54,7 +56,6 @@ set_gradient!(s::StepsizeState, M, p, X) = copyto!(M, s.X, p, X) InteriorPointNewtonState( M::AbstractManifold, cmo::ConstrainedManifoldObjective, - p, sub_problem::Pr, sub_state::St; kwargs... @@ -65,16 +66,16 @@ are used to fill in reasonable defaults for the keywords. # Input -$(_arg_M) +$(_var(:Argument, :M; type=true)) * `cmo`: a [`ConstrainedManifoldObjective`](@ref) -$(_arg_p) -$(_arg_sub_problem) -$(_arg_sub_state) +$(_var(:Argument, :sub_problem)) +$(_var(:Argument, :sub_state)) # Keyword arguments Let `m` and `n` denote the number of inequality and equality constraints, respectively +$(_var(:Keyword, :p; add=:as_Initial)) * `μ=ones(m)` * `X=`[`zero_vector`](@extref `ManifoldsBase.zero_vector-Tuple{AbstractManifold, Any}`)`(M,p)` * `Y=zero(μ)` @@ -84,14 +85,14 @@ Let `m` and `n` denote the number of inequality and equality constraints, respec * `W=zero(s)` * `ρ=μ's/m` * `σ=`[`calculate_σ`](@ref)`(M, cmo, p, μ, λ, s)` -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(200)`[` | `](@ref StopWhenAny)[`StopWhenChangeLess`](@ref)`(1e-8)` -* `retraction_method=default_retraction_method(M, typeof(p))` +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(200)`[` | `](@ref StopWhenAny)[`StopWhenChangeLess`](@ref)`(1e-8)`")) +$(_var(:Keyword, :retraction_method)) * `step_objective=`[`ManifoldGradientObjective`](@ref)`(`[`KKTVectorFieldNormSq`](@ref)`(cmo)`, [`KKTVectorFieldNormSqGradient`](@ref)`(cmo)`; evaluation=[`InplaceEvaluation`](@ref)`())` * `vector_space=`[`Rn`](@ref Manopt.Rn): a function that, given an integer, returns the manifold to be used for the vector space components ``ℝ^m,ℝ^n`` -* `step_problem`: wrap the manifold ``$(_l_M) × ℝ^m × ℝ^n × ℝ^m`` +* `step_problem`: wrap the manifold ``$(_math(:M)) × ℝ^m × ℝ^n × ℝ^m`` * `step_state`: the [`StepsizeState`](@ref) with point and search direction -* `stepsize`: an [`ArmijoLinesearch`](@ref) with the [`InteriorPointCentralityCondition`](@ref) as - additional condition to accept a step. Note that this step size operates on its own `step_problem`and `step_state` +$(_var(:Keyword, :stepsize; default="[`ArmijoLinesearch`](@ref)`()`", add=" with the [`InteriorPointCentralityCondition`](@ref) as + additional condition to accept a step")) and internally `_step_M` and `_step_p` for the manifold and point in the stepsize. """ @@ -128,9 +129,9 @@ mutable struct InteriorPointNewtonState{ function InteriorPointNewtonState( M::AbstractManifold, cmo::ConstrainedManifoldObjective, - p::P, sub_problem::Pr, - sub_state::Union{AbstractEvaluationType,AbstractManoptSolverState}; + sub_state::St; + p::P=rand(M), X::T=zero_vector(M, p), μ::V=ones(length(get_inequality_constraint(M, cmo, p, :))), Y::V=zero(μ), @@ -153,8 +154,8 @@ mutable struct InteriorPointNewtonState{ step_problem::StepPr=DefaultManoptProblem(_step_M, step_objective), _step_p=rand(_step_M), step_state::StepSt=StepsizeState(_step_p, zero_vector(_step_M, _step_p)), - centrality_condition::F=(N, p) -> true, # Todo - stepsize::S=ArmijoLinesearch( + centrality_condition::F=(N, p) -> true, + stepsize::S=ArmijoLinesearchStepsize( get_manifold(step_problem); retraction_method=default_retraction_method(get_manifold(step_problem)), initial_stepsize=1.0, @@ -164,7 +165,8 @@ mutable struct InteriorPointNewtonState{ ) where { P, T, - Pr, + Pr<:Union{AbstractManoptProblem,F} where {F}, + St<:AbstractManoptSolverState, V, R, F, @@ -174,11 +176,10 @@ mutable struct InteriorPointNewtonState{ RTM<:AbstractRetractionMethod, S<:Stepsize, } - sub_state_storage = maybe_wrap_evaluation_type(sub_state) - ips = new{P,T,Pr,typeof(sub_state_storage),V,R,SC,RTM,S,StepPr,StepSt}() + ips = new{P,T,Pr,St,V,R,SC,RTM,S,StepPr,StepSt}() ips.p = p ips.sub_problem = sub_problem - ips.sub_state = sub_state_storage + ips.sub_state = sub_state ips.μ = μ ips.λ = λ ips.s = s @@ -196,7 +197,16 @@ mutable struct InteriorPointNewtonState{ return ips end end - +function InteriorPointNewtonState( + M::AbstractManifold, + cmo::ConstrainedManifoldObjective, + sub_problem; + evaluation::E=AllocatingEvaluation(), + kwargs..., +) where {E<:AbstractEvaluationType} + cfs = ClosedFormSubSolverState(; evaluation=evaluation) + return InteriorPointNewtonState(M, cmo, sub_problem, cfs; kwargs...) +end # get & set iterate get_iterate(ips::InteriorPointNewtonState) = ips.p function set_iterate!(ips::InteriorPointNewtonState, ::AbstractManifold, p) @@ -828,9 +838,9 @@ The parameters `τ1`, `τ2` are initialise to zero if not provided. !!! note - Besides [`get_manopt_parameter`](@ref) for all three constants, - and [`set_manopt_parameter!`](@ref) for ``γ``, - to update ``τ_1`` and ``τ_2``, call `set_manopt_parameter(ipcc, :τ, N, q)` to update + Besides [`get_parameter`](@ref) for all three constants, + and [`set_parameter!`](@ref) for ``γ``, + to update ``τ_1`` and ``τ_2``, call `set_parameter(ipcc, :τ, N, q)` to update both ``τ_1`` and ``τ_2`` according to the formulae above. """ mutable struct InteriorPointCentralityCondition{CO,R} @@ -853,20 +863,20 @@ function (ipcc::InteriorPointCentralityCondition)(N, qα) (sum(μα .* sα) - ipcc.γ * ipcc.τ2 * normKKTqα < 0) && return false return true end -function get_manopt_parameter(ipcc::InteriorPointCentralityCondition, ::Val{:γ}) +function get_parameter(ipcc::InteriorPointCentralityCondition, ::Val{:γ}) return ipcc.γ end -function set_manopt_parameter!(ipcc::InteriorPointCentralityCondition, ::Val{:γ}, γ) +function set_parameter!(ipcc::InteriorPointCentralityCondition, ::Val{:γ}, γ) ipcc.γ = γ return ipcc end -function get_manopt_parameter(ipcc::InteriorPointCentralityCondition, ::Val{:τ1}) +function get_parameter(ipcc::InteriorPointCentralityCondition, ::Val{:τ1}) return ipcc.τ1 end -function get_manopt_parameter(ipcc::InteriorPointCentralityCondition, ::Val{:τ2}) +function get_parameter(ipcc::InteriorPointCentralityCondition, ::Val{:τ2}) return ipcc.τ2 end -function set_manopt_parameter!(ipcc::InteriorPointCentralityCondition, ::Val{:τ}, N, q) +function set_parameter!(ipcc::InteriorPointCentralityCondition, ::Val{:τ}, N, q) μ = q[N, 2] s = q[N, 4] m = length(μ) diff --git a/src/plans/manifold_default_factory.jl b/src/plans/manifold_default_factory.jl new file mode 100644 index 0000000000..30f737525f --- /dev/null +++ b/src/plans/manifold_default_factory.jl @@ -0,0 +1,124 @@ +""" + ManifoldDefaultsFactory{M,T,A,K} + +A generic factory to postpone the instantiation of certain types from within $(_link(:Manopt)), +in order to be able to adapt it to defaults from different manifolds and/or postpone the +decission on which manifold to use to a later point + +For now this is established for + +* [`DirectionUpdateRule`](@ref)s (TODO: WIP) +* [`Stepsize`](@ref) (TODO: WIP) +* [`StoppingCriterion`](@ref) (TODO:WIP) + +This factory stores necessary and optional parameters as well as +keyword arguments provided by the user to later produce the type this factory is for. + +Besides a manifold as a fallback, the factory can also be used for the (maybe simpler) +types from the list of types that do not require the manifold. + +# Fields + +* `M::Union{Nothing,AbstractManifold}`: provide a manifold for defaults +* `args::A`: arguments (`args...`) that are passed to the type constructor +* `kwargs::K`: keyword arguments (`kwargs...`) that are passed to the type constructor +* `constructor_requires_manifold::Bool`: indicate whether the type construtor requires the manifold or not + + +# Constructor + + ManifoldDefaultsFactory(T, args...; kwargs...) + ManifoldDefaultsFactory(T, M, args...; kwargs...) + + +# Input + +* `T` a subtype of types listed above that this factory is to produce +* `M` (optional) a manifold used for the defaults in case no manifold is provided. +* `args...` arguments to pass to the constructor of `T` +* `kwargs...` keyword arguments to pass (overwrite) when constructing `T`. + +# Keyword arguments + +* `requires_manifold=true`: indicate whether the type constructor this factory wraps + requires the manifold as first argument or not. + +All other keyword arguments are internally stored to be used in the type constructor + +as well as arguments and keyword arguments for the update rule. + +# see also + +[`_produce_type`](@ref) +""" +struct ManifoldDefaultsFactory{T,TM<:Union{<:AbstractManifold,Nothing},A,K} + M::TM + args::A + kwargs::K + constructor_requires_manifold::Bool +end +function ManifoldDefaultsFactory( + T::Type, M::TM, args...; requires_manifold=true, kwargs... +) where {TM<:AbstractManifold} + return ManifoldDefaultsFactory{T,TM,typeof(args),typeof(kwargs)}( + M, args, kwargs, requires_manifold + ) +end +function ManifoldDefaultsFactory(T::Type, args...; requires_manifold=true, kwargs...) + return ManifoldDefaultsFactory{T,Nothing,typeof(args),typeof(kwargs)}( + nothing, args, kwargs, requires_manifold + ) +end +function (mdf::ManifoldDefaultsFactory{T})(M::AbstractManifold) where {T} + if mdf.constructor_requires_manifold + return T(M, mdf.args...; mdf.kwargs...) + else + return T(mdf.args...; mdf.kwargs...) + end +end +function (mdf::ManifoldDefaultsFactory{T,<:AbstractManifold})() where {T} + if mdf.constructor_requires_manifold + return T(mdf.M, mdf.args...; mdf.kwargs...) + else + return T(mdf.args...; mdf.kwargs...) + end +end +function (mdf::ManifoldDefaultsFactory{T,Nothing})() where {T} + (!mdf.constructor_requires_manifold) && (return T(mdf.args...; mdf.kwargs...)) + throw(MethodError(T, mdf.args)) +end +""" + _produce_type(t::T, M::AbstractManifold) + _produce_type(t::ManifoldDefaultsFactory{T}, M::AbstractManifold) + +Use the [`ManifoldDefaultsFactory`](@ref)`{T}` to produce an instance of type `T`. +This acts transparent in the way that if you provide an instance `t::T` already, this will +just be returned. +""" +_produce_type(t, M::AbstractManifold) = t +_produce_type(t::ManifoldDefaultsFactory, M::AbstractManifold) = t(M) + +function show(io::IO, mdf::ManifoldDefaultsFactory{T,M}) where {T,M} + rm = mdf.constructor_requires_manifold + if M === Nothing + mline = "without a default manifold" + else + mline = "Default manifold: $(mdf.M)" + (!rm) && (mline = "$mline and the constructor does also not require a manifold.") + end + ar_s = length(mdf.args) == 0 ? " none" : "\n$(join([" * $s" for s in mdf.args],"\n"))" + kw_s = if length(mdf.kwargs) == 0 + " none" + else + "\n$(join([" * $(s.first)=$(repr(s.second))" for s in mdf.kwargs],"\n"))" + end + s = """ + ManifoldDefaultsFactory($T) + * $mline + + * Arguments:$(ar_s) + + * Keyword arguments:$(kw_s) + """ + return print(io, s) +end diff --git a/src/plans/nonlinear_least_squares_plan.jl b/src/plans/nonlinear_least_squares_plan.jl index b49b26e065..007353d529 100644 --- a/src/plans/nonlinear_least_squares_plan.jl +++ b/src/plans/nonlinear_least_squares_plan.jl @@ -8,7 +8,7 @@ A type for nonlinear least squares problems. Specify a nonlinear least squares problem # Fields -* `f` a function ``f: $(_l_M) → ℝ^d`` to minimize +* `f` a function ``f: $(_math(:M)) → ℝ^d`` to minimize * `jacobian!!` Jacobian of the function ``f`` * `jacobian_tangent_basis` the basis of tangent space used for computing the Jacobian. * `num_components` number of values returned by `f` (equal to `d`). @@ -47,11 +47,8 @@ function NonlinearLeastSquaresObjective( jacobian_f::TJ, num_components::Int; evaluation::AbstractEvaluationType=AllocatingEvaluation(), - jacB=nothing, - jacobian_tangent_basis::TB=isnothing(jacB) ? DefaultOrthonormalBasis() : jacB, + jacobian_tangent_basis::TB=DefaultOrthonormalBasis(), ) where {TF,TJ,TB<:AbstractBasis} - !isnothing(jacB) && - (@warn "The keyword `jacB` is deprecated, use `jacobian_tangent_basis` instead.") return NonlinearLeastSquaresObjective{typeof(evaluation),TF,TJ,TB}( f, jacobian_f, jacobian_tangent_basis, num_components ) @@ -141,11 +138,11 @@ Describes a Gradient based descent algorithm, with A default value is given in brackets if a parameter can be left out in initialization. -* $_field_iterate -* $_field_stop -* $_field_retr +$(_var(:Field, :p; add=[:as_Iterate])) +$(_var(:Field, :retraction_method)) * `residual_values`: value of ``F`` calculated in the solver setup or the previous iteration * `residual_values_temp`: value of ``F`` for the current proposal point +$(_var(:Field, :stopping_criterion, "stop")) * `jacF`: the current Jacobian of ``F`` * `gradient`: the current gradient of ``F`` * `step_vector`: the tangent vector at `x` that is used to move to the next point @@ -161,7 +158,7 @@ A default value is given in brackets if a parameter can be left out in initializ # Constructor - LevenbergMarquardtState(M, p, initial_residual_values, initial_jacF; kwargs...) + LevenbergMarquardtState(M, initial_residual_values, initial_jacF; kwargs...) Generate the Levenberg-Marquardt solver state. @@ -173,9 +170,9 @@ The following fields are keyword arguments * `damping_term_min=0.1` * `η=0.2`, * `expect_zero_residual=false` -* `initial_gradient=`$(_link_zero_vector()) -* $_kw_retraction_method_default -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(200)`$_sc_any[`StopWhenGradientNormLess`](@ref)`(1e-12)`$_sc_any[`StopWhenStepsizeLess`](@ref)`(1e-12)` +* `initial_gradient=`$(_link(:zero_vector)) +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(200)`$(_sc(:Any))[`StopWhenGradientNormLess`](@ref)`(1e-12)`$(_sc(:Any))[`StopWhenStepsizeLess`](@ref)`(1e-12)`")) # See also @@ -207,10 +204,10 @@ mutable struct LevenbergMarquardtState{ last_step_successful::Bool function LevenbergMarquardtState( M::AbstractManifold, - p::P, initial_residual_values::Tresidual_values, - initial_jacF::TJac, - initial_gradient::TGrad=zero_vector(M, p); + initial_jacF::TJac; + p::P=rand(M), + X::TGrad=zero_vector(M, p), stopping_criterion::StoppingCriterion=StopAfterIteration(200) | StopWhenGradientNormLess(1e-12) | StopWhenStepsizeLess(1e-12), @@ -249,8 +246,8 @@ mutable struct LevenbergMarquardtState{ initial_residual_values, copy(initial_residual_values), initial_jacF, - initial_gradient, - allocate(M, initial_gradient), + X, + allocate(M, X), zero(Tparams), η, damping_term_min, diff --git a/src/plans/objective.jl b/src/plans/objective.jl index b189beb75a..4972b5a6b3 100644 --- a/src/plans/objective.jl +++ b/src/plans/objective.jl @@ -75,6 +75,52 @@ function ReturnManifoldObjective( return ReturnManifoldObjective{E,O2,O1}(o) end +# +# Internal converters if the variable in the high-level interface is a number. +# + +function _ensure_mutating_cost(cost, q::Number) + return isnothing(cost) ? cost : (M, p) -> cost(M, p[]) +end +function _ensure_mutating_cost(cost, p) + return cost +end + +function _ensure_mutating_gradient(grad_f, p, evaluation::AbstractEvaluationType) + return grad_f +end +function _ensure_mutating_gradient(grad_f, q::Number, evaluation::AllocatingEvaluation) + return isnothing(grad_f) ? grad_f : (M, p) -> [grad_f(M, p[])] +end +function _ensure_mutating_gradient(grad_f, q::Number, evaluation::InplaceEvaluation) + return isnothing(grad_f) ? grad_f : (M, X, p) -> (X .= [grad_f(M, p[])]) +end + +function _ensure_mutating_hessian(hess_f, p, evaluation::AbstractEvaluationType) + return hess_f +end +function _ensure_mutating_hessian(hess_f, q::Number, evaluation::AllocatingEvaluation) + return isnothing(hess_f) ? hess_f : (M, p, X) -> [hess_f(M, p[], X[])] +end +function _ensure_mutating_hessian(hess_f, q::Number, evaluation::InplaceEvaluation) + return isnothing(hess_f) ? hess_f : (M, Y, p, X) -> (Y .= [hess_f(M, p[], X[])]) +end + +function _ensure_mutating_prox(prox_f, p, evaluation::AbstractEvaluationType) + return prox_f +end +function _ensure_mutating_prox(prox_f, q::Number, evaluation::AllocatingEvaluation) + return isnothing(prox_f) ? prox_f : (M, λ, p) -> [prox_f(M, λ, p[])] +end +function _ensure_mutating_prox(prox_f, q::Number, evaluation::InplaceEvaluation) + return isnothing(prox_f) ? prox_f : (M, q, λ, p) -> (q .= [prox_f(M, λ, p[])]) +end + +_ensure_mutating_variable(p) = p +_ensure_mutating_variable(q::Number) = [q] +_ensure_matching_output(::T, q::Vector{T}) where {T} = length(q) == 1 ? q[] : q +_ensure_matching_output(p, q) = q + """ dispatch_objective_decorator(o::AbstractManoptSolverState) @@ -120,7 +166,7 @@ function _get_objective(o::AbstractManifoldObjective, ::Val{true}, rec=true) end """ - set_manopt_parameter!(amo::AbstractManifoldObjective, element::Symbol, args...) + set_parameter!(amo::AbstractManifoldObjective, element::Symbol, args...) Set a certain `args...` from the [`AbstractManifoldObjective`](@ref) `amo` to `value. This function should dispatch on `Val(element)`. @@ -129,14 +175,14 @@ Currently supported * `:Cost` passes to the [`get_cost_function`](@ref) * `:Gradient` passes to the [`get_gradient_function`](@ref) """ -set_manopt_parameter!(amo::AbstractManifoldObjective, e::Symbol, args...) +set_parameter!(amo::AbstractManifoldObjective, e::Symbol, args...) -function set_manopt_parameter!(amo::AbstractManifoldObjective, ::Val{:Cost}, args...) - set_manopt_parameter!(get_cost_function(amo), args...) +function set_parameter!(amo::AbstractManifoldObjective, ::Val{:Cost}, args...) + set_parameter!(get_cost_function(amo), args...) return amo end -function set_manopt_parameter!(amo::AbstractManifoldObjective, ::Val{:Gradient}, args...) - set_manopt_parameter!(get_gradient_function(amo), args...) +function set_parameter!(amo::AbstractManifoldObjective, ::Val{:Gradient}, args...) + set_parameter!(get_gradient_function(amo), args...) return amo end diff --git a/src/plans/plan.jl b/src/plans/plan.jl index d6cf2cf8af..9a5465654c 100644 --- a/src/plans/plan.jl +++ b/src/plans/plan.jl @@ -10,20 +10,20 @@ It might also be more verbose in explaining, or hide internal information. status_summary(e) = "$(e)" """ - set_manopt_parameter!(f, element::Symbol , args...) + set_parameter!(f, element::Symbol , args...) For any `f` and a `Symbol` `e`, dispatch on its value so by default, to set some `args...` in `f` or one of uts sub elements. """ -function set_manopt_parameter!(f, e::Symbol, args...) - return set_manopt_parameter!(f, Val(e), args...) +function set_parameter!(f, e::Symbol, args...) + return set_parameter!(f, Val(e), args...) end -function set_manopt_parameter!(f, args...) +function set_parameter!(f, args...) return f end """ - get_manopt_parameter(f, element::Symbol, args...) + get_parameter(f, element::Symbol, args...) Access arbitrary parameters from `f` addressed by a symbol `element`. @@ -32,13 +32,13 @@ get some element from `f` potentially further qualified by `args...`. This functions returns `nothing` if `f` does not have the property `element` """ -function get_manopt_parameter(f, e::Symbol, args...) - return get_manopt_parameter(f, Val(e), args...) +function get_parameter(f, e::Symbol, args...) + return get_parameter(f, Val(e), args...) end -get_manopt_parameter(f, args...) = nothing +get_parameter(f, args...) = nothing """ - get_manopt_parameter(element::Symbol; default=nothing) + get_parameter(element::Symbol; default=nothing) Access global [`Manopt`](@ref) parameters addressed by a symbol `element`. This first dispatches on the value of `element`. @@ -56,21 +56,19 @@ the optimisation on manifolds is different from the usual “experience” in (classical, Euclidean) optimization. Any other value has the same effect as not setting it. """ -function get_manopt_parameter( - e::Symbol, args...; default=get_manopt_parameter(Val(e), Val(:default)) -) +function get_parameter(e::Symbol, args...; default=get_parameter(Val(e), Val(:default))) return @load_preference("$(e)", default) end -function get_manopt_parameter( - e::Symbol, s::Symbol, args...; default=get_manopt_parameter(Val(e), Val(:default)) +function get_parameter( + e::Symbol, s::Symbol, args...; default=get_parameter(Val(e), Val(:default)) ) return @load_preference("$(e)", default) end# Handle empty defaults -get_manopt_parameter(::Symbol, ::Val{:default}) = nothing -get_manopt_parameter(::Val{:Mode}, v::Val{:default}) = nothing +get_parameter(::Symbol, ::Val{:default}) = nothing +get_parameter(::Val{:Mode}, v::Val{:default}) = nothing """ - set_manopt_parameter!(element::Symbol, value::Union{String,Bool,<:Number}) + set_parameter!(element::Symbol, value::Union{String,Bool,<:Number}) Set global [`Manopt`](@ref) parameters addressed by a symbol `element`. W @@ -81,10 +79,10 @@ The parameters are stored to the global settings using [`Preferences.jl`](https: Passing a `value` of `""` deletes the corresponding entry from the preferences. Whenever the `LocalPreferences.toml` is modified, this is also issued as an `@info`. """ -function set_manopt_parameter!(e::Symbol, value::Union{String,Bool,<:Number}) +function set_parameter!(e::Symbol, value::Union{String,Bool,<:Number}) if length(value) == 0 @delete_preferences!(string(e)) - v = get_manopt_parameter(e, Val(:default)) + v = get_parameter(e, Val(:default)) default = isnothing(v) ? "" : ((v isa String) ? " \"$v\"" : " ($v)") @info("Resetting the `Manopt.jl` parameter :$(e) to default$(default).") else @@ -98,12 +96,12 @@ end A small internal helper to indicate whether tutorial mode is active. -You can set the mode by calling `set_manopt_parameter!(:Mode, "Tutorial")` or deactivate it -by `set_manopt_parameter!(:Mode, "")`. +You can set the mode by calling `set_parameter!(:Mode, "Tutorial")` or deactivate it +by `set_parameter!(:Mode, "")`. """ -is_tutorial_mode() = (get_manopt_parameter(:Mode) == "Tutorial") +is_tutorial_mode() = (get_parameter(:Mode) == "Tutorial") -include("docstring_snippets.jl") +include("manifold_default_factory.jl") include("objective.jl") include("problem.jl") include("solver_state.jl") diff --git a/src/plans/primal_dual_plan.jl b/src/plans/primal_dual_plan.jl index eec88ac43e..5d634a3247 100644 --- a/src/plans/primal_dual_plan.jl +++ b/src/plans/primal_dual_plan.jl @@ -96,14 +96,14 @@ function PrimalDualManifoldObjective( ) end -@doc raw""" +@doc """ q = get_primal_prox(M::AbstractManifold, p::AbstractPrimalDualManifoldObjective, σ, p) get_primal_prox!(M::AbstractManifold, p::AbstractPrimalDualManifoldObjective, q, σ, p) Evaluate the proximal map of ``F`` stored within [`AbstractPrimalDualManifoldObjective`](@ref) ```math -\operatorname{prox}_{σF}(x) +$(_tex(:prox))_{σF}(x) ``` which can also be computed in place of `y`. @@ -164,14 +164,14 @@ function get_primal_prox!( return get_primal_prox!(M, q, get_objective(admo, false), σ, p) end -@doc raw""" +@doc """ Y = get_dual_prox(N::AbstractManifold, apdmo::AbstractPrimalDualManifoldObjective, n, τ, X) get_dual_prox!(N::AbstractManifold, apdmo::AbstractPrimalDualManifoldObjective, Y, n, τ, X) Evaluate the proximal map of ``g_n^*`` stored within [`AbstractPrimalDualManifoldObjective`](@ref) ```math - Y = \operatorname{prox}_{τG_n^*}(X) + Y = $(_tex(:prox))}_{τG_n^*}(X) ``` which can also be computed in place of `Y`. diff --git a/src/plans/problem.jl b/src/plans/problem.jl index 9b68848bac..1e00295b39 100644 --- a/src/plans/problem.jl +++ b/src/plans/problem.jl @@ -82,7 +82,7 @@ evaluate the cost function `f` defined on `M` stored within the [`AbstractManifo get_cost(::AbstractManifold, ::AbstractManifoldObjective, p) """ - set_manopt_parameter!(ams::AbstractManoptProblem, element::Symbol, field::Symbol , value) + set_parameter!(ams::AbstractManoptProblem, element::Symbol, field::Symbol , value) Set a certain field/element from the [`AbstractManoptProblem`](@ref) `ams` to `value`. This function usually dispatches on `Val(element)`. @@ -91,17 +91,17 @@ encapsulated parts of the problem. Main values for `element` are `:Manifold` and `:Objective`. """ -set_manopt_parameter!(amp::AbstractManoptProblem, e::Symbol, args...) +set_parameter!(amp::AbstractManoptProblem, e::Symbol, args...) -function set_manopt_parameter!(amp::AbstractManoptProblem, ::Val{:Manifold}, args...) - set_manopt_parameter!(get_manifold(amp), args...) +function set_parameter!(amp::AbstractManoptProblem, ::Val{:Manifold}, args...) + set_parameter!(get_manifold(amp), args...) return amp end -function set_manopt_parameter!(TpM::TangentSpace, ::Union{Val{:Basepoint},Val{:p}}, p) +function set_parameter!(TpM::TangentSpace, ::Union{Val{:Basepoint},Val{:p}}, p) copyto!(TpM.manifold, TpM.point, p) return TpM end -function set_manopt_parameter!(amp::AbstractManoptProblem, ::Val{:Objective}, args...) - set_manopt_parameter!(get_objective(amp), args...) +function set_parameter!(amp::AbstractManoptProblem, ::Val{:Objective}, args...) + set_parameter!(get_objective(amp), args...) return amp end diff --git a/src/plans/proximal_plan.jl b/src/plans/proximal_plan.jl index 9cce485c1c..5ff4948016 100644 --- a/src/plans/proximal_plan.jl +++ b/src/plans/proximal_plan.jl @@ -3,16 +3,16 @@ # Proximal Point Problem and State # # -@doc raw""" +@doc """ ManifoldProximalMapObjective{E<:AbstractEvaluationType, TC, TP, V <: Vector{<:Integer}} <: AbstractManifoldCostObjective{E, TC} specify a problem for solvers based on the evaluation of proximal maps. # Fields -* `cost`: a function ``F:\mathcal M→ℝ`` to +* `cost`: a function ``F:$(_tex(:Cal, "M"))→ℝ`` to minimize -* `proxes`: proximal maps ``\operatorname{prox}_{λ\varphi}:\mathcal M→\mathcal M`` +* `proxes`: proximal maps ``$(_tex(:prox))_{λφ}:$(_tex(:Cal, "M")) → $(_tex(:Cal, "M"))`` as functions `(M, λ, p) -> q`. * `number_of_proxes`: number of proximal maps per function, to specify when one of the maps is a combined one such that the proximal maps @@ -139,21 +139,24 @@ stores options for the [`cyclic_proximal_point`](@ref) algorithm. These are the # Fields -* $_field_p -* $_field_stop +$(_var(:Field, :p; add=[:as_Iterate])) +$(_var(:Field, :stopping_criterion, "stop")) * `λ`: a function for the values of ``λ_k`` per iteration(cycle ``ì`` * `oder_type`: whether to use a randomly permuted sequence (`:FixedRandomOrder`), a per cycle permuted sequence (`:RandomOrder`) or the default linear one. # Constructor - CyclicProximalPointState(M, p; kwargs...) + CyclicProximalPointState(M; kwargs...) + +Generate the options -Generate the options with the following keyword arguments +# Keyword arguments -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(2000)` -* `λ=i -> 1.0 / i` a function to compute the ``λ_k, k ∈ $(_l_Manifold("N"))``, * `evaluation_order=:LinearOrder`: soecify the `order_type` +* `λ=i -> 1.0 / i` a function to compute the ``λ_k, k ∈ $(_tex(:Cal, "N"))``, +$(_var(:Keyword, :p; add=:as_Initial)) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(2000)`")) # See also @@ -169,8 +172,8 @@ mutable struct CyclicProximalPointState{P,TStop<:StoppingCriterion,Tλ} <: end function CyclicProximalPointState( - ::AbstractManifold, - p::P; + M::AbstractManifold; + p::P=rand(M), stopping_criterion::S=StopAfterIteration(2000), λ::F=(i) -> 1.0 / i, evaluation_order::Symbol=:LinearOrder, diff --git a/src/plans/quasi_newton_plan.jl b/src/plans/quasi_newton_plan.jl index f0bd5eb5b4..deea6e6b89 100644 --- a/src/plans/quasi_newton_plan.jl +++ b/src/plans/quasi_newton_plan.jl @@ -311,27 +311,27 @@ where the operator is stored as a matrix. A distinction is made between the upda approximation of the Hessian, $_doc_QN_H_update, and the update of the approximation of the Hessian inverse, $_doc_QN_B_update. For the first case, the coordinates of the search direction ``η_k`` with respect to -a basis ``$(_math_sequence("b", "i", "1", "n"))`` are determined by solving a linear system of equations +a basis ``$(_math(:Sequence, "b", "i", "1", "n"))`` are determined by solving a linear system of equations $_doc_QN_H_full_system -where ``H_k`` is the matrix representing the operator with respect to the basis ``$(_math_sequence("b", "i", "1", "n"))`` -and ``\\widehat{$_l_grad} f(p_k)}`` represents the coordinates of the gradient of -the objective function ``f`` in ``x_k`` with respect to the basis ``$(_math_sequence("b", "i", "1", "n"))``. +where ``H_k`` is the matrix representing the operator with respect to the basis ``$(_math(:Sequence, "b", "i", "1", "n"))`` +and ``\\widehat{$(_tex(:grad))} f(p_k)}`` represents the coordinates of the gradient of +the objective function ``f`` in ``x_k`` with respect to the basis ``$(_math(:Sequence, "b", "i", "1", "n"))``. If a method is chosen where Hessian inverse is approximated, the coordinates of the search -direction ``η_k`` with respect to a basis ``$(_math_sequence("b", "i", "1", "n"))`` are obtained simply by +direction ``η_k`` with respect to a basis ``$(_math(:Sequence, "b", "i", "1", "n"))`` are obtained simply by matrix-vector multiplication $_doc_QN_B_full_system -where ``B_k`` is the matrix representing the operator with respect to the basis ``$(_math_sequence("b", "i", "1", "n"))`` -and `\\widehat{$_l_grad} f(p_k)}``. In the end, the search direction ``η_k`` is -generated from the coordinates ``\\hat{eta_k}`` and the vectors of the basis ``$(_math_sequence("b", "i", "1", "n"))`` +where ``B_k`` is the matrix representing the operator with respect to the basis ``$(_math(:Sequence, "b", "i", "1", "n"))`` +and `\\widehat{$(_tex(:grad))} f(p_k)}``. In the end, the search direction ``η_k`` is +generated from the coordinates ``\\hat{eta_k}`` and the vectors of the basis ``$(_math(:Sequence, "b", "i", "1", "n"))`` in both variants. The [`AbstractQuasiNewtonUpdateRule`](@ref) indicates which quasi-Newton update rule is used. In all of them, the Euclidean update formula is used to generate the matrix ``H_{k+1}`` -and ``B_{k+1}``, and the basis ``$(_math_sequence("b", "i", "1", "n"))`` is transported into the upcoming tangent -space ``T_{p_{k+1}} $_l_M``, preferably with an isometric vector transport, or generated there. +and ``B_{k+1}``, and the basis ``$(_math(:Sequence, "b", "i", "1", "n"))`` is transported into the upcoming tangent +space ``T_{p_{k+1}} $(_tex(:Cal, "M"))``, preferably with an isometric vector transport, or generated there. # Provided functors @@ -342,9 +342,9 @@ space ``T_{p_{k+1}} $_l_M``, preferably with an isometric vector transport, or g * `basis`: an `AbstractBasis` to use in the tangent spaces * `matrix`: the matrix which represents the approximating operator. -* `scale`: (`true) indicates whether the initial matrix (= identity matrix) should be scaled before the first update. +* `initial_scale`: when initialising the update, a unit matrix is used as initial approximation, scaled by this factor * `update`: a [`AbstractQuasiNewtonUpdateRule`](@ref). -* $_field_vector_transp +$(_var(:Field, :vector_transport_method)) # Constructor @@ -358,8 +358,8 @@ space ``T_{p_{k+1}} $_l_M``, preferably with an isometric vector transport, or g ## Keyword arguments -* `scale=true` -* $_kw_vector_transport_method_default +* `initial_scale=1.0` +$(_var(:Keyword, :vector_transport_method)) Generate the Update rule with defaults from a manifold and the names corresponding to the fields. @@ -374,19 +374,20 @@ mutable struct QuasiNewtonMatrixDirectionUpdate{ B<:AbstractBasis, VT<:AbstractVectorTransportMethod, M<:AbstractMatrix, + F<:Real, } <: AbstractQuasiNewtonDirectionUpdate basis::B matrix::M - scale::Bool + initial_scale::F update::NT vector_transport_method::VT end function status_summary(d::QuasiNewtonMatrixDirectionUpdate) - return "$(d.update) with initial scaling $(d.scale) and vector transport method $(d.vector_transport_method)." + return "$(d.update) with initial scaling $(d.initial_scale) and vector transport method $(d.vector_transport_method)." end function show(io::IO, d::QuasiNewtonMatrixDirectionUpdate) s = """ - QuasiNewtonMatrixDirectionUpdate($(d.basis), $(d.matrix), $(d.scale), $(d.update), $(d.vector_transport_method)) + QuasiNewtonMatrixDirectionUpdate($(d.basis), $(d.matrix), $(d.initial_scale), $(d.update), $(d.vector_transport_method)) """ return print(io, s) end @@ -394,18 +395,18 @@ function QuasiNewtonMatrixDirectionUpdate( M::AbstractManifold, update::U, basis::B=DefaultOrthonormalBasis(), - m::MT=Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M)), - ; - scale::Bool=true, + m::MT=Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M)); + initial_scale::F=1.0, vector_transport_method::V=default_vector_transport_method(M), ) where { U<:AbstractQuasiNewtonUpdateRule, MT<:AbstractMatrix, B<:AbstractBasis, V<:AbstractVectorTransportMethod, + F<:Real, } - return QuasiNewtonMatrixDirectionUpdate{U,B,V,MT}( - basis, m, scale, update, vector_transport_method + return QuasiNewtonMatrixDirectionUpdate{U,B,V,MT,F}( + basis, m, initial_scale, update, vector_transport_method ) end function (d::QuasiNewtonMatrixDirectionUpdate)(mp, st) @@ -435,32 +436,37 @@ function initialize_update!(d::QuasiNewtonMatrixDirectionUpdate) return d end -@doc raw""" +_doc_QN_B = raw""" +```math +\mathcal{B}^{(0)}_k[⋅] = \frac{g_{p_k}(s_{k-1}, y_{k-1})}{g_{p_k}(y_{k-1}, y_{k-1})} \; \mathrm{id}_{T_{p_k} \mathcal{M}}[⋅] +``` +""" + +@doc """ QuasiNewtonLimitedMemoryDirectionUpdate <: AbstractQuasiNewtonDirectionUpdate This [`AbstractQuasiNewtonDirectionUpdate`](@ref) represents the limited-memory Riemannian BFGS update, where the approximating operator is represented by ``m`` stored pairs of tangent -vectors ``\{ \widetilde{s}_i, \widetilde{y}_i\}_{i=k-m}^{k-1}`` in the ``k``-th iteration. -For the calculation of the search direction ``η_k``, the generalisation of the two-loop recursion +vectors ``$(_math(:Sequence, _tex(:widehat, "s"), "i", "k-m", "k-1")) and $(_math(:Sequence, _tex(:widehat, "y"), "i", "k-m", "k-1")) in the ``k``-th iteration. +For the calculation of the search direction ``X_k``, the generalisation of the two-loop recursion is used (see [HuangGallivanAbsil:2015](@cite)), -since it only requires inner products and linear combinations of tangent vectors in ``T_{x_k} \mathcal{M}``. -For that the stored pairs of tangent vectors ``\{ \widetilde{s}_i, \widetilde{y}_i\}_{i=k-m}^{k-1}``, -the gradient ``\operatorname{grad}f(x_k)`` of the objective function ``f`` in ``x_k`` +since it only requires inner products and linear combinations of tangent vectors in ``$(_math(:TpM; p="p_k"))``. +For that the stored pairs of tangent vectors ``$( _tex(:widehat, "s"))_i, $(_tex(:widehat, "y"))_i``, +the gradient ``$(_tex(:grad)) f(p_k)`` of the objective function ``f`` in ``p_k`` and the positive definite self-adjoint operator -```math -\mathcal{B}^{(0)}_k[⋅] = \frac{g_{x_k}(s_{k-1}, y_{k-1})}{g_{x_k}(y_{k-1}, y_{k-1})} \; \mathrm{id}_{T_{x_k} \mathcal{M}}[⋅] -``` +$(_doc_QN_B) are used. The two-loop recursion can be understood as that the [`InverseBFGS`](@ref) update -is executed ``m`` times in a row on ``\mathcal{B}^{(0)}_k[⋅]`` using the tangent vectors ``\{ \widetilde{s}_i, \widetilde{y}_i\}_{i=k-m}^{k-1}``, and in the same time the resulting operator ``\mathcal{B}^{LRBFGS}_k [⋅]`` is directly applied on ``\operatorname{grad}f(x_k)``. +is executed ``m`` times in a row on ``$(_tex(:Cal, "B"))^{(0)}_k[⋅]`` using the tangent vectors ``$( _tex(:widehat, "s"))_i,$( _tex(:widehat, "y"))_i``, +and in the same time the resulting operator ``$(_tex(:Cal, "B"))^{LRBFGS}_k [⋅]`` is directly applied on ``$(_tex(:grad))f(x_k)``. When updating there are two cases: if there is still free memory, ``k < m``, the previously -stored vector pairs ``\{ \widetilde{s}_i, \widetilde{y}_i\}_{i=k-m}^{k-1}`` have to be -transported into the upcoming tangent space ``T_{x_{k+1}} \mathcal{M}``. -If there is no free memory, the oldest pair ``\{ \widetilde{s}_{k−m}, \widetilde{y}_{k−m}\}`` -has to be discarded and then all the remaining vector pairs ``\{ \widetilde{s}_i, \widetilde{y}_i\}_{i=k-m+1}^{k-1}`` -are transported into the tangent space ``T_{x_{k+1}} \mathcal{M}``. -After that the new values ``s_k = \widetilde{s}_k = T^{S}_{x_k, α_k η_k}(α_k η_k)`` and ``y_k = \widetilde{y}_k`` +stored vector pairs ``$( _tex(:widehat, "s"))_i,$( _tex(:widehat, "y"))_i`` have to be +transported into the upcoming tangent space ``$(_math(:TpM; p="p_{k+1}"))``. +If there is no free memory, the oldest pair ``$( _tex(:widehat, "s"))_i,$( _tex(:widehat, "y"))_i`` +has to be discarded and then all the remaining vector pairs ``$( _tex(:widehat, "s"))_i,$( _tex(:widehat, "y"))_i`` +are transported into the tangent space ``$(_math(:TpM; p="p_{k+1}"))``. +After that the new values ``s_k = $( _tex(:widehat, "s"))_k = T^{S}_{x_k, α_k η_k}(α_k η_k)`` and ``y_k = $( _tex(:widehat, "y"))_k`` are stored at the beginning. This process ensures that new information about the objective function is always included and the old, probably no longer relevant, information is discarded. @@ -471,23 +477,25 @@ function is always included and the old, probably no longer relevant, informatio # Fields -* `memory_s` the set of the stored (and transported) search directions times step size ``\{ \widetilde{s}_i\}_{i=k-m}^{k-1}``. -* `memory_y` set of the stored gradient differences ``\{ \widetilde{y}_i\}_{i=k-m}^{k-1}``. -* `ξ` a variable used in the two-loop recursion. -* `ρ` a variable used in the two-loop recursion. -* `scale` initial scaling of the Hessian -* `vector_transport_method` a `AbstractVectorTransportMethod` -* `message` a string containing a potential warning that might have appeared +* `memory_s`; the set of the stored (and transported) search directions times step size `` $(_math(:Sequence, _tex(:widehat, "s"), "i", "k-m", "k-1"))``. +* `memory_y`: set of the stored gradient differences ``$(_math(:Sequence, _tex(:widehat, "y"), "i", "k-m", "k-1"))``. +* `ξ`: a variable used in the two-loop recursion. +* `ρ`; a variable used in the two-loop recursion. +* `initial_scale`: initial scaling of the Hessian +$(_var(:Field, :vector_transport_method)) +* `message`: a string containing a potential warning that might have appeared +* `project!`: a function to stabilize the update by projecting on the tangent space # Constructor + QuasiNewtonLimitedMemoryDirectionUpdate( M::AbstractManifold, x, update::AbstractQuasiNewtonUpdateRule, memory_size; initial_vector=zero_vector(M,x), - scale::Real=1.0 - project::Bool=true + initial_scale::Real=1.0 + project!=copyto! ) # See also @@ -502,13 +510,14 @@ mutable struct QuasiNewtonLimitedMemoryDirectionUpdate{ F, V<:AbstractVector{F}, VT<:AbstractVectorTransportMethod, + Proj, } <: AbstractQuasiNewtonDirectionUpdate memory_s::CircularBuffer{T} memory_y::CircularBuffer{T} ξ::Vector{F} ρ::Vector{F} - scale::F - project::Bool + initial_scale::F + project!::Proj vector_transport_method::VT message::String end @@ -518,22 +527,22 @@ function QuasiNewtonLimitedMemoryDirectionUpdate( ::NT, memory_size::Int; initial_vector::T=zero_vector(M, p), - scale::Real=1.0, - project::Bool=true, + initial_scale::Real=1.0, + (project!)::Proj=copyto!, vector_transport_method::VTM=default_vector_transport_method(M, typeof(p)), -) where {NT<:AbstractQuasiNewtonUpdateRule,T,VTM<:AbstractVectorTransportMethod} +) where {NT<:AbstractQuasiNewtonUpdateRule,T,VTM<:AbstractVectorTransportMethod,Proj} mT = allocate_result_type( - M, QuasiNewtonLimitedMemoryDirectionUpdate, (p, initial_vector, scale) + M, QuasiNewtonLimitedMemoryDirectionUpdate, (p, initial_vector, initial_scale) ) m1 = zeros(mT, memory_size) m2 = zeros(mT, memory_size) - return QuasiNewtonLimitedMemoryDirectionUpdate{NT,T,mT,typeof(m1),VTM}( + return QuasiNewtonLimitedMemoryDirectionUpdate{NT,T,mT,typeof(m1),VTM,Proj}( CircularBuffer{T}(memory_size), CircularBuffer{T}(memory_size), m1, m2, - convert(mT, scale), - project, + convert(mT, initial_scale), + project!, vector_transport_method, "", ) @@ -541,8 +550,8 @@ end get_message(d::QuasiNewtonLimitedMemoryDirectionUpdate) = d.message function status_summary(d::QuasiNewtonLimitedMemoryDirectionUpdate{T}) where {T} s = "limited memory $T (size $(length(d.memory_s)))" - (d.scale != 1.0) && (s = "$(s) initial scaling $(d.scale)") - d.project && (s = "$(s), projections, ") + (d.initial_scale != 1.0) && (s = "$(s) initial scaling $(d.initial_scale)") + (d.project! !== copyto!) && (s = "$(s), projections, ") s = "$(s)and $(d.vector_transport_method) as vector transport." return s end @@ -593,7 +602,8 @@ function (d::QuasiNewtonLimitedMemoryDirectionUpdate{InverseBFGS})(r, mp, st) return r end # initial scaling guess - r ./= d.ρ[last_safe_index] * norm(M, p, d.memory_y[last_safe_index])^2 + r .*= + d.initial_scale / (d.ρ[last_safe_index] * norm(M, p, d.memory_y[last_safe_index])^2) # forward pass for i in eachindex(d.ρ) if abs(d.ρ[i]) > 0 @@ -601,7 +611,8 @@ function (d::QuasiNewtonLimitedMemoryDirectionUpdate{InverseBFGS})(r, mp, st) r .+= coeff .* d.memory_s[i] end end - d.project && embed_project!(M, r, p, r) + # potentially stabilize step by projecting. + d.project!(M, r, p, r) r .*= -1 return r end diff --git a/src/plans/record.jl b/src/plans/record.jl index d639f9313d..bb5bc213c0 100644 --- a/src/plans/record.jl +++ b/src/plans/record.jl @@ -103,23 +103,26 @@ _has_record(s::AbstractManoptSolverState, ::Val{true}) = has_record(s.state) _has_record(::AbstractManoptSolverState, ::Val{false}) = false """ - set_manopt_parameter!(ams::RecordSolverState, ::Val{:Record}, args...) + set_parameter!(ams::RecordSolverState, ::Val{:Record}, args...) Set certain values specified by `args...` into the elements of the `recordDictionary` """ -function set_manopt_parameter!(rss::RecordSolverState, ::Val{:Record}, args...) +function set_parameter!(rss::RecordSolverState, ::Val{:Record}, args...) for d in values(rss.recordDictionary) - set_manopt_parameter!(d, args...) + set_parameter!(d, args...) end return rss end # all other pass through -function set_manopt_parameter!(rss::RecordSolverState, v::Val{T}, args...) where {T} - return set_manopt_parameter!(rss.state, v, args...) +function set_parameter!(rss::RecordSolverState, v::Val{T}, args...) where {T} + return set_parameter!(rss.state, v, args...) +end +function set_parameter!(rss::RecordSolverState, v::Val{:StoppingCriterion}, args...) + return set_parameter!(rss.state, v, args...) end # all other pass through -function get_manopt_parameter(rss::RecordSolverState, v::Val{T}, args...) where {T} - return get_manopt_parameter(rss.state, v, args...) +function get_parameter(rss::RecordSolverState, v::Val{T}, args...) where {T} + return get_parameter(rss.state, v, args...) end @doc """ @@ -243,7 +246,7 @@ function (re::RecordEvery)( # Set activity to activate or deactivate subsolvers # note that since recording is happening at the end # sets activity for the _next_ iteration - set_manopt_parameter!( + set_parameter!( ams, :SubState, :Record, :Activity, !(k < 1) && (rem(k + 1, re.every) == 0) ) return nothing @@ -463,11 +466,11 @@ end function status_summary(rwa::RecordWhenActive) return repr(rwa) end -function set_manopt_parameter!(rwa::RecordWhenActive, v::Val, args...) - set_manopt_parameter!(rwa.record, v, args...) +function set_parameter!(rwa::RecordWhenActive, v::Val, args...) + set_parameter!(rwa.record, v, args...) return rwa end -function set_manopt_parameter!(rwa::RecordWhenActive, ::Val{:Activity}, v) +function set_parameter!(rwa::RecordWhenActive, ::Val{:Activity}, v) return rwa.active = v end get_record(r::RecordWhenActive, args...) = get_record(r.record, args...) @@ -499,7 +502,7 @@ end show(io::IO, ::RecordCost) = print(io, "RecordCost()") status_summary(di::RecordCost) = ":Cost" -@doc raw""" +@doc """ RecordChange <: RecordAction debug for the amount of change of the iterate (see [`get_iterate`](@ref)`(s)` of the [`AbstractManoptSolverState`](@ref)) @@ -510,7 +513,7 @@ during the last iteration. * `storage` : a [`StoreStateAction`](@ref) to store (at least) the last iterate to use this as the last value (to compute the change) serving as a potential cache shared with other components of the solver. -* `inverse_retraction_method` : the inverse retraction to be used for approximating distance. +$(_var(:Keyword, :inverse_retraction_method)) * `recorded_values` : to store the recorded values # Constructor @@ -532,14 +535,9 @@ mutable struct RecordChange{ function RecordChange( M::AbstractManifold=DefaultManifold(); storage::Union{Nothing,StoreStateAction}=nothing, - manifold::Union{Nothing,AbstractManifold}=nothing, inverse_retraction_method::IRT=default_inverse_retraction_method(M), ) where {IRT<:AbstractInverseRetractionMethod} irm = inverse_retraction_method - if !isnothing(manifold) - @warn "The `manifold` keyword is deprecated, use the first positional argument `M`. This keyword for now sets `inverse_retracion_method`." - irm = default_inverse_retraction_method(manifold) - end if isnothing(storage) if M isa DefaultManifold storage = StoreStateAction(M; store_fields=[:Iterate]) diff --git a/src/plans/solver_state.jl b/src/plans/solver_state.jl index 77b56dc8b6..def63cb189 100644 --- a/src/plans/solver_state.jl +++ b/src/plans/solver_state.jl @@ -1,7 +1,7 @@ @inline _extract_val(::Val{T}) where {T} = T -@doc raw""" +@doc """ AbstractManoptSolverState A general super type for all solver states. @@ -9,10 +9,10 @@ A general super type for all solver states. # Fields The following fields are assumed to be default. If you use different ones, -provide the access functions accordingly +adapt the the access functions [`get_iterate`](@ref) and [`get_stopping_criterion`](@ref) accordingly -* `p` a point on a manifold with the current iterate -* `stop` a [`StoppingCriterion`](@ref). +$(_var(:Field, :p; add=[:as_Iterate])) +$(_var(:Field, :stopping_criterion, "stop")) """ abstract type AbstractManoptSolverState end @@ -384,7 +384,7 @@ is necessity for the construction. as vectors of symbols each referring to fields of the state (lower case symbols) or semantic ones (upper case). -* `p_init` (`rand(M)`) +* `p_init` (`rand(M)`) but making sure this is not a number but a (mutatable) array * `X_init` (`zero_vector(M, p_init)`) are used to initialize the point and vector storage, change these if you use other @@ -442,7 +442,7 @@ end store_fields::Vector{Symbol}=Symbol[], store_points::Union{Type{<:Tuple},Vector{Symbol}}=Tuple{}, store_vectors::Union{Type{<:Tuple},Vector{Symbol}}=Tuple{}, - p_init=rand(M), + p_init=_ensure_mutating_variable(rand(M)), X_init=zero_vector(M, p_init), once=true, ) @@ -583,7 +583,7 @@ function update_storage!( elseif key === :Gradient a.values[key] = deepcopy(get_gradient(s)) elseif key === :Population - swarm = get_manopt_parameter(s, key) + swarm = get_parameter(s, key) if !isnothing(swarm) a.values[key] = deepcopy.(swarm) end diff --git a/src/plans/stepsize.jl b/src/plans/stepsize.jl index 5485cdd4d9..55cddfcfc4 100644 --- a/src/plans/stepsize.jl +++ b/src/plans/stepsize.jl @@ -9,6 +9,9 @@ the interface `(p,o,i)` where a [`AbstractManoptProblem`](@ref) as well as [`Abs and the current number of iterations are the arguments and returns a number, namely the stepsize to use. +For most it is adviable to employ a [`ManifoldDefaultsFactory`](@ref). Then +the function creating the factory should either be called `TypeOf` or if that is confusing or too generic, `TypeOfLength` + # See also [`Linesearch`](@ref) @@ -42,7 +45,7 @@ end """ ConstantStepsize <: Stepsize -A functor that always returns a fixed step size. +A functor `(problem, state, ...) -> s` to provide a constant step size `s`. # Fields @@ -56,26 +59,20 @@ A functor that always returns a fixed step size. initialize the stepsize to a constant `s` of type `t`. - ConstantStepsize(M::AbstractManifold=DefaultManifold(2); - stepsize=injectivity_radius(M)/2, type::Symbol=:relative + ConstantStepsize( + M::AbstractManifold=DefaultManifold(), + s=min(1.0, injectivity_radius(M)/2); + type::Symbol=:relative ) - -initialize the stepsize to a constant `stepsize`, which by default is half the injectivity -radius, unless the radius is infinity, then the default step size is `1`. """ -mutable struct ConstantStepsize{T} <: Stepsize - length::T +mutable struct ConstantStepsize{R<:Real} <: Stepsize + length::R type::Symbol end function ConstantStepsize( - M::AbstractManifold=DefaultManifold(2); - stepsize=isinf(injectivity_radius(M)) ? 1.0 : injectivity_radius(M) / 2, - type=:relative, -) - return ConstantStepsize{typeof(stepsize)}(stepsize, type) -end -function ConstantStepsize(stepsize::T) where {T<:Number} - return ConstantStepsize{T}(stepsize, :relative) + M::AbstractManifold, length::R=min(injectivity_radius(M) / 2, 1.0); type=:relative +) where {R<:Real} + return ConstantStepsize{R}(length, type) end function (cs::ConstantStepsize)( amp::AbstractManoptProblem, ams::AbstractManoptSolverState, ::Any, args...; kwargs... @@ -90,12 +87,37 @@ function (cs::ConstantStepsize)( return s end get_initial_stepsize(s::ConstantStepsize) = s.length -show(io::IO, cs::ConstantStepsize) = print(io, "ConstantStepsize($(cs.length), $(cs.type))") +function show(io::IO, cs::ConstantStepsize) + return print(io, "ConstantLength($(cs.length); type=:$(cs.type))") +end -@doc raw""" +""" + ConstantLength(s; kwargs...) + ConstantLength(M::AbstractManifold, s; kwargs...) + +Specify a [`Stepsize`](@ref) that is constant. + +# Input + +* `M` (optional) +`s=min( injectivity_radius(M)/2, 1.0)` : the length to use. + +# Keyword argument + +* `type::Symbol=relative` specify the type of constant step size. + * `:relative` – scale the gradient tangent vector ``X`` to ``s*X`` + * `:absolute` – scale the gradient to an absolute step length ``s``, that is ``$(_tex(:frac, "s", _tex(:norm, "X")))X`` + +$(_note(:ManifoldDefaultFactory, "ConstantStepsize")) +""" +function ConstantLength(args...; kwargs...) + return ManifoldDefaultsFactory(Manopt.ConstantStepsize, args...; kwargs...) +end + +@doc """ DecreasingStepsize() -A functor that represents several decreasing step sizes +A functor `(problem, state, ...) -> s` to provide a constant step size `s`. # Fields @@ -110,52 +132,43 @@ A functor that represents several decreasing step sizes In total the complete formulae reads for the ``i``th iterate as -````math -s_i = \frac{(l - i a)f^i}{(i+s)^e} -```` +```math +s_i = $(_tex(:frac, "(l - i a)f^i", "(i+s)^e")) +``` and hence the default simplifies to just ``s_i = \frac{l}{i}`` # Constructor - DecreasingStepsize(l=1,f=1,a=0,e=1,s=0,type=:relative) - -Alternatively one can also use the following keyword. - - DecreasingStepsize(M::AbstractManifold=DefaultManifold(3); - length=injectivity_radius(M)/2, - multiplier=1.0, + DecreasingStepsize(M::AbstractManifold; + length=min(injectivity_radius(M)/2, 1.0), + factor=1.0, subtrahend=0.0, exponent=1.0, - shift=0, + shift=0.0, type=:relative, ) initializes all fields, where none of them is mandatory and the length is set to half and to ``1`` if the injectivity radius is infinite. """ -mutable struct DecreasingStepsize <: Stepsize - length::Float64 - factor::Float64 - subtrahend::Float64 - exponent::Float64 - shift::Int +mutable struct DecreasingStepsize{R<:Real} <: Stepsize + length::R + factor::R + subtrahend::R + exponent::R + shift::R type::Symbol - function DecreasingStepsize( - l::Real, f::Real=1.0, a::Real=0.0, e::Real=1.0, s::Int=0, type::Symbol=:relative - ) - return new(l, f, a, e, s, type) - end end function DecreasingStepsize( - M::AbstractManifold=DefaultManifold(3); - length=isinf(manifold_dimension(M)) ? 1.0 : manifold_dimension(M) / 2, - factor=1.0, - subtrahend=0.0, - exponent=1.0, - shift=0, + M::AbstractManifold; + length::R=isinf(manifold_dimension(M)) ? 1.0 : manifold_dimension(M) / 2, + factor::R=1.0, + subtrahend::R=0.0, + exponent::R=1.0, + shift::R=0.0, type::Symbol=:relative, -) +) where {R} return DecreasingStepsize(length, factor, subtrahend, exponent, shift, type) end function (s::DecreasingStepsize)( @@ -174,18 +187,42 @@ get_initial_stepsize(s::DecreasingStepsize) = s.length function show(io::IO, s::DecreasingStepsize) return print( io, - "DecreasingStepsize(; length=$(s.length), factor=$(s.factor), subtrahend=$(s.subtrahend), shift=$(s.shift))", + "DecreasingLength(; length=$(s.length), factor=$(s.factor), subtrahend=$(s.subtrahend), shift=$(s.shift), type=$(s.type))", ) end +""" + DegreasingLength(; kwargs...) + DecreasingLength(M::AbstractManifold; kwargs...) + +Specify a [`Stepsize`] that is decreasing as ``s_k = $(_tex(:frac, "(l - ak)f^i", "(k+s)^e")) +with the following + +# Keyword arguments + +* `exponent=1.0`: the exponent ``e`` in the denominator +* `factor=1.0`: the factor ``f`` in the nominator +* `length=min(injectivity_radius(M)/2, 1.0)`: the initial step size ``l``. +* `subtrahend=0.0`: a value ``a`` that is subtracted every iteration +* `shift=0.0`: shift the denominator iterator ``k`` by ``s``. +* `type::Symbol=relative` specify the type of constant step size. +* `:relative` – scale the gradient tangent vector ``X`` to ``s_k*X`` +* `:absolute` – scale the gradient to an absolute step length ``s_k``, that is ``$(_tex(:frac, "s_k", _tex(:norm, "X")))X`` + +$(_note(:ManifoldDefaultFactory, "DecreasingStepsize")) +""" +function DecreasingLength(args...; kwargs...) + return ManifoldDefaultsFactory(Manopt.DecreasingStepsize, args...; kwargs...) +end + """ Linesearch <: Stepsize An abstract functor to represent line search type step size determinations, see -[`Stepsize`](@ref) for details. One example is the [`ArmijoLinesearch`](@ref) +[`Stepsize`](@ref) for details. One example is the [`ArmijoLinesearchStepsize`](@ref) functor. Compared to simple step sizes, the line search functors provide an interface of -the form `(p,o,i,η) -> s` with an additional (but optional) fourth parameter to +the form `(p,o,i,X) -> s` with an additional (but optional) fourth parameter to provide a search direction; this should default to something reasonable, most prominently the negative gradient. """ @@ -203,15 +240,16 @@ function armijo_initial_guess( end @doc """ - ArmijoLinesearch <: Linesearch + ArmijoLinesearchStepsize <: Linesearch -A functor representing Armijo line search including the last runs state string the last stepsize. +A functor `problem, state, k, X) -> s to provide an Armijo line search to compute step size, +based on the search direction `X` # Fields * `candidate_point`: to store an interim result * `initial_stepsize`: and initial step size -* `retraction_method`: the retraction to use +$(_var(:Keyword, :retraction_method)) * `contraction_factor`: exponent for line search reduction * `sufficient_decrease`: gain within Armijo's rule * `last_stepsize`: the last step size to start the search with @@ -222,9 +260,6 @@ A functor representing Armijo line search including the last runs state string t fulfill. The default accepts all points. * `additional_increase_condition`: (`(M,p) -> true`) specify a condtion that additionally to checking a valid increase has to be fulfilled. The default accepts all points. - -Furthermore the following fields act as safeguards - * `stop_when_stepsize_less`: smallest stepsize when to stop (the last one before is taken) * `stop_when_stepsize_exceeds`: largest stepsize when to stop. * `stop_increasing_at_step`: last step to increase the stepsize (phase 1), @@ -234,24 +269,15 @@ Pass `:Messages` to a `debug=` to see `@info`s when these happen. # Constructor - ArmijoLinesearch(M=DefaultManifold()) + ArmijoLinesearchStepsize(M::AbstractManifold; kwarg...) with the fields keyword arguments and the retraction is set to the default retraction on `M`. -The constructors return the functor to perform Armijo line search, where - - (a::ArmijoLinesearch)(amp::AbstractManoptProblem, ams::AbstractManoptSolverState, k) - -of a [`AbstractManoptProblem`](@ref) `amp`, [`AbstractManoptSolverState`](@ref) `ams` and a current iterate `i` -with keywords. - ## Keyword arguments * `candidate_point=`(`allocate_result(M, rand)`) -* `η=-`[`get_gradient`](@ref)`(mp, get_iterate(s))` * `initial_stepsize=1.0` -* $_kw_retraction_method_default: - $_kw_retraction_method +$(_var(:Keyword, :retraction_method)) * `contraction_factor=0.95` * `sufficient_decrease=0.1` * `last_stepsize=initialstepsize` @@ -261,7 +287,8 @@ with keywords. * `stop_increasing_at_step=100` * `stop_decreasing_at_step=1000` """ -mutable struct ArmijoLinesearch{TRM<:AbstractRetractionMethod,P,I,F,IGF,DF,IF} <: Linesearch +mutable struct ArmijoLinesearchStepsize{TRM<:AbstractRetractionMethod,P,I,F,IGF,DF,IF} <: + Linesearch candidate_point::P contraction_factor::F initial_guess::IGF @@ -276,8 +303,8 @@ mutable struct ArmijoLinesearch{TRM<:AbstractRetractionMethod,P,I,F,IGF,DF,IF} < stop_decreasing_at_step::I additional_decrease_condition::DF additional_increase_condition::IF - function ArmijoLinesearch( - M::AbstractManifold=DefaultManifold(); + function ArmijoLinesearchStepsize( + M::AbstractManifold; additional_decrease_condition::DF=(M, p) -> true, additional_increase_condition::IF=(M, p) -> true, candidate_point::P=allocate_result(M, rand), @@ -309,7 +336,7 @@ mutable struct ArmijoLinesearch{TRM<:AbstractRetractionMethod,P,I,F,IGF,DF,IF} < ) end end -function (a::ArmijoLinesearch)( +function (a::ArmijoLinesearchStepsize)( mp::AbstractManoptProblem, s::AbstractManoptSolverState, k::Int, @@ -320,7 +347,7 @@ function (a::ArmijoLinesearch)( X = get_gradient!(mp, get_gradient(s), p) return a(mp, p, X, η; initial_guess=a.initial_guess(mp, s, k, a.last_stepsize)) end -function (a::ArmijoLinesearch)( +function (a::ArmijoLinesearchStepsize)( mp::AbstractManoptProblem, p, X, η; initial_guess=1.0, kwargs... ) l = norm(get_manifold(mp), p, η) @@ -344,36 +371,256 @@ function (a::ArmijoLinesearch)( ) return a.last_stepsize end -get_initial_stepsize(a::ArmijoLinesearch) = a.initial_stepsize -function show(io::IO, als::ArmijoLinesearch) +get_initial_stepsize(a::ArmijoLinesearchStepsize) = a.initial_stepsize +function show(io::IO, als::ArmijoLinesearchStepsize) return print( io, """ - ArmijoLinesearch() with keyword parameters - * initial_stepsize = $(als.initial_stepsize) - * retraction_method = $(als.retraction_method) - * contraction_factor = $(als.contraction_factor) - * sufficient_decrease = $(als.sufficient_decrease)""", + ArmijoLinesearch(; + initial_stepsize=$(als.initial_stepsize) + retraction_method=$(als.retraction_method) + contraction_factor=$(als.contraction_factor) + sufficient_decrease=$(als.sufficient_decrease) + )""", ) end -function status_summary(als::ArmijoLinesearch) +function status_summary(als::ArmijoLinesearchStepsize) return "$(als)\nand a computed last stepsize of $(als.last_stepsize)" end -get_message(a::ArmijoLinesearch) = a.message -function get_manopt_parameter(a::ArmijoLinesearch, s::Val{:DecreaseCondition}, args...) - return get_manopt_parameter(a.additional_decrease_condition, args...) +get_message(a::ArmijoLinesearchStepsize) = a.message +function get_parameter(a::ArmijoLinesearchStepsize, s::Val{:DecreaseCondition}, args...) + return get_parameter(a.additional_decrease_condition, args...) end -function get_manopt_parameter(a::ArmijoLinesearch, ::Val{:IncreaseCondition}, args...) - return get_manopt_parameter(a.additional_increase_condition, args...) +function get_parameter(a::ArmijoLinesearchStepsize, ::Val{:IncreaseCondition}, args...) + return get_parameter(a.additional_increase_condition, args...) end -function set_manopt_parameter!(a::ArmijoLinesearch, s::Val{:DecreaseCondition}, args...) - set_manopt_parameter!(a.additional_decrease_condition, args...) +function set_parameter!(a::ArmijoLinesearchStepsize, s::Val{:DecreaseCondition}, args...) + set_parameter!(a.additional_decrease_condition, args...) return a end -function set_manopt_parameter!(a::ArmijoLinesearch, ::Val{:IncreaseCondition}, args...) - set_manopt_parameter!(a.additional_increase_condition, args...) +function set_parameter!(a::ArmijoLinesearchStepsize, ::Val{:IncreaseCondition}, args...) + set_parameter!(a.additional_increase_condition, args...) return a end +""" + ArmijoLinesearch(; kwargs...) + ArmijoLinesearch(M::AbstractManifold; kwargs...) + +Specify a step size that performs an Armijo line search. Given a Function ``f:$(_math(:M))→ℝ`` +and its Riemannian Gradient ``$(_tex(:grad))f: $(_math(:M))→$(_math(:TM))``, +the curent point ``p∈$(_math(:M))`` and a search direction ``X∈$(_math(:TpM))``. + +Then the step size ``s`` is found by reducing the initial step size ``s`` until + +```math +f($(_tex(:retr))_p(sX)) ≤ f(p) - τs ⟨ X, $(_tex(:grad))f(p) ⟩_p +``` + +is fulfilled. for a sufficient decrease value ``τ ∈ (0,1)``. + +To be a bit more optimistic, if ``s`` already fulfils this, a first search is done, +__increasing__ the given ``s`` until for a first time this step does not hold. + +Overall, we look for step size, that provides _enough decrease_, see +[Boumal:2023; p. 58](@cite) for more information. + +# Keyword arguments + +* `additional_decrease_condition=(M, p) -> true`: + specify an additional criterion that has to be met to accept a step size in the decreasing loop +* `additional_increase_condition::IF=(M, p) -> true`: + specify an additional criterion that has to be met to accept a step size in the (initial) increase loop +* `candidate_point=allocate_result(M, rand)`: + speciy a point to be used as memory for the candidate points. +* `contraction_factor=0.95`: how to update ``s`` in the decrease step +* `initial_stepsize=1.0``: specify an initial step size +* `initial_guess=armijo_initial_guess`: instead of the initial step, start with this guess. +$(_var(:Keyword, :retraction_method)) +* `stop_when_stepsize_less=0.0`: a safeguard, stop when the decreasing step is below this (nonnegative) bound. +* `stop_when_stepsize_exceeds=max_stepsize(M)`: a safeguard to not choose a too long step size when initially increasing +* `stop_increasing_at_step=100`: stop the initial increasing loop after this amount of steps. Set to `0` to never increase in the beginning +* `stop_decreasing_at_step=1000`: maximal number of Armijo decreases / tests to perform +* `sufficient_decrease=0.1`: the sufficient decrease parameter ``τ`` + +For the stop safe guards you can pass `:Messages` to a `debug=` to see `@info` messages when these happen. + +$(_note(:ManifoldDefaultFactory, "ArmijoLinesearchStepsize")) +""" +function ArmijoLinesearch(args...; kwargs...) + return ManifoldDefaultsFactory(Manopt.ArmijoLinesearchStepsize, args...; kwargs...) +end + +@doc """ + AdaptiveWNGradientStepsize{I<:Integer,R<:Real,F<:Function} <: Stepsize + +A functor `problem, state, k, X) -> s to an adaptive gradient method introduced by [GrapigliaStella:2023](@cite). +See [`AdaptiveWNGradient`](@ref) for the mathematical details. + +# Fields + +* `count_threshold::I`: an `Integer` for ``$(_tex(:hat, "c"))`` +* `minimal_bound::R`: the value for ``b_{$(_tex(:text, "min"))}`` +* `alternate_bound::F`: how to determine ``$(_tex(:hat, "k"))_k`` as a function of `(bmin, bk, hat_c) -> hat_bk` +* `gradient_reduction::R`: the gradient reduction factor threshold ``α ∈ [0,1)`` +* `gradient_bound::R`: the bound ``b_k``. +* `weight::R`: ``ω_k`` initialised to ``ω_0 = ```norm(M, p, X)` if this is not zero, `1.0` otherwise. +* `count::I`: ``c_k``, initialised to ``c_0 = 0``. + +# Constructor + + AdaptiveWNGrad(M::AbstractManifold; kwargs...) + +## Keyword arguments + +* `adaptive=true`: switches the `gradient_reduction ``α`` (if `true`) to `0`. +* `alternate_bound = (bk, hat_c) -> min(gradient_bound == 0 ? 1.0 : gradient_bound, max(minimal_bound, bk / (3 * hat_c))` +* `count_threshold=4` +* `gradient_reduction::R=adaptive ? 0.9 : 0.0` +* `gradient_bound=norm(M, p, X)` +* `minimal_bound=1e-4` +$(_var(:Keyword, :p; add="only used to define the `gradient_bound`")) +$(_var(:Keyword, :X; add="only used to define the `gradient_bound`")) +""" +mutable struct AdaptiveWNGradientStepsize{I<:Integer,R<:Real,F<:Function} <: Stepsize + count_threshold::I + minimal_bound::R + alternate_bound::F + gradient_reduction::R + gradient_bound::R + weight::R + count::I +end +function AdaptiveWNGradientStepsize( + M::AbstractManifold; + p=rand(M), + X=zero_vector(M, p), + adaptive::Bool=true, + count_threshold::I=4, + minimal_bound::R=1e-4, + gradient_reduction::R=adaptive ? 0.9 : 0.0, + gradient_bound::R=norm(M, p, X), + alternate_bound::F=(bk, hat_c) -> min( + gradient_bound == 0 ? 1.0 : gradient_bound, max(minimal_bound, bk / (3 * hat_c)) + ), + kwargs..., +) where {I<:Integer,R<:Real,F<:Function} + if gradient_bound == 0 + # If the gradient bound defaults to zero, set it to 1 + gradient_bound = 1.0 + end + return AdaptiveWNGradientStepsize{I,R,F}( + count_threshold, + minimal_bound, + alternate_bound, + gradient_reduction, + gradient_bound, + gradient_bound, + 0, + ) +end +function (awng::AdaptiveWNGradientStepsize)( + mp::AbstractManoptProblem, s::AbstractGradientSolverState, i, args...; kwargs... +) + M = get_manifold(mp) + p = get_iterate(s) + X = get_gradient(mp, p) + isnan(awng.weight) || (awng.weight = norm(M, p, X)) # init ω_0 + if i == 0 # init fields + awng.weight = norm(M, p, X) # init ω_0 + (awng.weight == 0) && (awng.weight = 1.0) + awng.count = 0 + return 1 / awng.gradient_bound + end + grad_norm = norm(M, p, X) + if grad_norm < awng.gradient_reduction * awng.weight # grad norm < αω_{k-1} + if awng.count + 1 == awng.count_threshold + awng.gradient_bound = awng.alternate_bound( + awng.gradient_bound, awng.count_threshold + ) + awng.weight = grad_norm + awng.count = 0 + else + awng.gradient_bound = awng.gradient_bound + grad_norm^2 / awng.gradient_bound + #weight stays unchanged + awng.count += 1 + end + else + awng.gradient_bound = awng.gradient_bound + grad_norm^2 / awng.gradient_bound + #weight stays unchanged + awng.count = 0 + end + return 1 / awng.gradient_bound +end +get_initial_stepsize(awng::AdaptiveWNGradientStepsize) = 1 / awng.gradient_bound +get_last_stepsize(awng::AdaptiveWNGradientStepsize) = 1 / awng.gradient_bound +function show(io::IO, awng::AdaptiveWNGradientStepsize) + s = """ + AdaptiveWNGradient(; + count_threshold=$(awng.count_threshold), + minimal_bound=$(awng.minimal_bound), + alternate_bound=$(awng.alternate_bound), + gradient_reduction=$(awng.gradient_reduction), + gradient_bound=$(awng.gradient_bound) + ) + + as well as internally the weight ω_k = $(awng.weight) and current count c_k = $(awng.count). + """ + return print(io, s) +end +""" + AdaptiveWNGradient(; kwargs...) + AdaptiveWNGradient(M::AbstractManifold; kwargs...) + +A stepsize based on the adaptive gradient method introduced by [GrapigliaStella:2023](@cite). + +Given a positive threshold ``$(_tex(:hat, "c")) ∈ ℕ``, +an minimal bound ``b_{$(_tex(:text, "min"))} > 0``, +an initial ``b_0 ≥ b_{$(_tex(:text, "min"))}``, and a +gradient reduction factor threshold ``α ∈ [0,1)``. + +Set ``c_0=0`` and use ``ω_0 = $(_tex(:norm, "$(_tex(:grad)) f(p_0)"; index="p_0"))``. + +For the first iterate use the initial step size ``s_0 = $(_tex(:frac, "1", "b_0"))``. + +Then, given the last gradient ``X_{k-1} = $(_tex(:grad)) f(x_{k-1})``, +and a previous ``ω_{k-1}``, the values ``(b_k, ω_k, c_k)`` are computed +using ``X_k = $(_tex(:grad)) f(p_k)`` and the following cases + +If ``$(_tex(:norm, "X_k"; index="p_k")) ≤ αω_{k-1}``, then let +``$(_tex(:hat, "b"))_{k-1} ∈ [b_{$(_tex(:text, "min"))},b_{k-1}]`` and set + +```math +(b_k, ω_k, c_k) = $(_tex(:cases, +"$(_tex(:bigl))($(_tex(:hat, "b"))_{k-1}, $(_tex(:norm, "X_k"; index="p_k")), 0 $(_tex(:bigr))) & $(_tex(:text, " if ")) c_{k-1}+1 = $(_tex(:hat, "c"))", +"$(_tex(:bigl))( b_{k-1} + $(_tex(:frac, _tex(:norm, "X_k"; index="p_k")*"^2", "b_{k-1}")), ω_{k-1}, c_{k-1}+1 $(_tex(:Bigr))) & $(_tex(:text, " if ")) c_{k-1}+1<$(_tex(:hat, "c"))", +)) +``` + +If ``$(_tex(:norm, "X_k"; index="p_k")) > αω_{k-1}``, the set + +```math +(b_k, ω_k, c_k) = $(_tex(:Bigl))( b_{k-1} + $(_tex(:frac, _tex(:norm, "X_k"; index="p_k")*"^2", "b_{k-1}")), ω_{k-1}, 0 $(_tex(:Bigr))) +``` + +and return the step size ``s_k = $(_tex(:frac, "1", "b_k"))``. + +Note that for ``α=0`` this is the Riemannian variant of `WNGRad`. + +## Keyword arguments + +* `adaptive=true`: switches the `gradient_reduction ``α`` (if `true`) to `0`. +* `alternate_bound = (bk, hat_c) -> min(gradient_bound == 0 ? 1.0 : gradient_bound, max(minimal_bound, bk / (3 * hat_c))`: + how to determine ``$(_tex(:hat, "k"))_k`` as a function of `(bmin, bk, hat_c) -> hat_bk` +* `count_threshold=4`: an `Integer` for ``$(_tex(:hat, "c"))`` +* `gradient_reduction::R=adaptive ? 0.9 : 0.0`: the gradient reduction factor threshold ``α ∈ [0,1)`` +* `gradient_bound=norm(M, p, X)`: the bound ``b_k``. +* `minimal_bound=1e-4`: the value ``b_{$(_tex(:text, "min"))}`` +$(_var(:Keyword, :p; add="only used to define the `gradient_bound`")) +$(_var(:Keyword, :X; add="only used to define the `gradient_bound`")) +""" +function AdaptiveWNGradient(args...; kwargs...) + return ManifoldDefaultsFactory(Manopt.AdaptiveWNGradientStepsize, args...; kwargs...) +end @doc """ (s, msg) = linesearch_backtrack(M, F, p, X, s, decrease, contract η = -X, f0 = f(p); kwargs...) @@ -393,8 +640,7 @@ perform a line search ## Keyword arguments -* $_kw_retraction_method_default - $_kw_retraction_method +$(_var(:Keyword, :retraction_method)) * `stop_when_stepsize_less=0.0`: to avoid numerical underflow * `stop_when_stepsize_exceeds=`[`max_stepsize`](@ref)`(M, p) / norm(M, p, η)`) to avoid leaving the injectivity radius on a manifold * `stop_increasing_at_step=100`: stop the initial increase of step size after these many steps @@ -490,92 +736,10 @@ function linesearch_backtrack!( return (s, msg) end -_doc_NM_linesearch = raw""" -```math -y_{k} = \operatorname{grad}F(x_{k}) - \operatorname{T}_{x_{k-1} → x_k}(\operatorname{grad}F(x_{k-1})) -``` -""" - -_doc_NM_linesearch2 = raw""" -```math -s_{k} = - α_{k-1} * \operatorname{T}_{x_{k-1} → x_k}(\operatorname{grad}F(x_{k-1})), -``` -""" - -_doc_NM_BB = raw""" -```math -α_k^{\text{BB}} = \begin{cases} -\min(α_{\text{max}}, \max(α_{\text{min}}, τ_{k})), & \text{if } ⟨s_{k}, y_{k}⟩_{x_k} > 0,\\ -α_{\text{max}}, & \text{else,} -\end{cases} -``` -""" - -_doc_NM_BB_direct = raw""" -```math -τ_{k} = \frac{⟨s_{k}, s_{k}⟩_{x_k}}{⟨s_{k}, y_{k}⟩_{x_k}}, -``` -""" - -_doc_NM_BB_indirect = raw""" -```math -τ_{k} = \frac{⟨s_{k}, s_{k}⟩_{x_k}}{⟨s_{k}, y_{k}⟩_{x_k}}, -``` -""" - -_doc_NM_BB_h = raw""" -```math -F(\operatorname{retr}_{x_k}(- σ^h α_k^{\text{BB}} \operatorname{grad}F(x_k))) -\leq -\max_{1 ≤ j ≤ \min(k+1,m)} F(x_{k+1-j}) - γ σ^h α_k^{\text{BB}} ⟨\operatorname{grad}F(x_k), \operatorname{grad}F(x_k)⟩_{x_k}, -``` -""" - -_doc_NM_final = raw""" -```math -α_k = σ^h α_k^{\text{BB}}. -``` -""" - @doc """ - NonmonotoneLinesearch <: Linesearch + NonmonotoneLinesearchStepsize{P,T,R<:Real} <: Linesearch A functor representing a nonmonotone line search using the Barzilai-Borwein step size [IannazzoPorcelli:2017](@cite). -Together with a gradient descent algorithm this line search represents the Riemannian Barzilai-Borwein with nonmonotone line-search (RBBNMLS) algorithm. -The order is shifted in comparison of the algorithm steps from the paper -by Iannazzo and Porcelli so that in each iteration this line search first finds - -$_doc_NM_linesearch - -and - -$_doc_NM_linesearch2 - -where ``α_{k-1}`` is the step size computed in the last iteration and $_l_vt is a vector transport. -Then the Barzilai—Borwein step size is - -$_doc_NM_BB - -where - -$_doc_NM_BB_direct - -if the direct strategy is chosen, - -$_doc_NM_BB_indirect - -in case of the inverse strategy and an alternation between the two in case of the -alternating strategy. Then find the smallest ``h = 0, 1, 2, …`` such that - -$_doc_NM_BB_h - -where ``σ`` is a step length reduction factor ``∈ (0,1)``, ``m`` is the number of iterations -after which the function value has to be lower than the current one -and ``γ`` is the sufficient decrease parameter ``∈(0,1)``. - -Then find the new stepsize by - -$_doc_NM_final # Fields @@ -583,37 +747,30 @@ $_doc_NM_final * `memory_size=10`: number of iterations after which the cost value needs to be lower than the current one * `bb_min_stepsize=1e-3`: lower bound for the Barzilai-Borwein step size greater than zero * `bb_max_stepsize=1e3`: upper bound for the Barzilai-Borwein step size greater than min_stepsize -* `retraction_method`: the retraction to use -* `strategy=direct`: defines if the new step size is computed using the direct, indirect or alternating strategy +$(_var(:Keyword, :retraction_method)) +* `strategy=direct`: defines if the new step size is computed using the `:direct`, `:indirect` or `:alternating` strategy * `storage`: (for `:Iterate` and `:Gradient`) a [`StoreStateAction`](@ref) -* `stepsize_reduction=0.5`: step size reduction factor contained in the interval (0,1) -* `sufficient_decrease=1e-4`: sufficient decrease parameter contained in the interval (0,1) -* `vector_transport_method`: the vector transport method to use +* `stepsize_reduction`: step size reduction factor contained in the interval (0,1) +* `sufficient_decrease`: sufficient decrease parameter contained in the interval (0,1) +$(_var(:Keyword, :vector_transport_method)) * `candidate_point`: to store an interim result - -Furthermore the following fields act as safeguards - * `stop_when_stepsize_less`: smallest stepsize when to stop (the last one before is taken) * `stop_when_stepsize_exceeds`: largest stepsize when to stop. * `stop_increasing_at_step`: last step to increase the stepsize (phase 1), * `stop_decreasing_at_step`: last step size to decrease the stepsize (phase 2), -Pass `:Messages` to a `debug=` to see `@info`s when these happen. - # Constructor - NonmonotoneLinesearch(M=DefaultManifold(); kwargs...) - -geerate the monotone linesearch + NonmonotoneLinesearchStepsize(M::AbstractManifold; kwargs...) ## Keyword arguments -* `candidate_point=allocate_result(M, rand)`: to store an interim result +* `p=allocate_result(M, rand)`: to store an interim result * `initial_stepsize=1.0` * `memory_size=10` * `bb_min_stepsize=1e-3` * `bb_max_stepsize=1e3` -* $_kw_retraction_method_default +$(_var(:Keyword, :retraction_method)) * `strategy=direct` * `storage=[`StoreStateAction`](@ref)`(M; store_fields=[:Iterate, :Gradient])`` * `stepsize_reduction=0.5` @@ -622,53 +779,53 @@ geerate the monotone linesearch * `stop_when_stepsize_exceeds=`[`max_stepsize`](@ref)`(M, p)`) * `stop_increasing_at_step=100` * `stop_decreasing_at_step=1000` -* $_kw_vector_transport_method_default - -The constructor return the functor to perform nonmonotone line search. +$(_var(:Keyword, :vector_transport_method)) """ -mutable struct NonmonotoneLinesearch{ +mutable struct NonmonotoneLinesearchStepsize{ + P, + T<:AbstractVector, + R<:Real, + I<:Integer, TRM<:AbstractRetractionMethod, VTM<:AbstractVectorTransportMethod, - T<:AbstractVector, TSSA<:StoreStateAction, - P, } <: Linesearch - bb_min_stepsize::Float64 - bb_max_stepsize::Float64 + bb_min_stepsize::R + bb_max_stepsize::R candiate_point::P - initial_stepsize::Float64 + initial_stepsize::R message::String old_costs::T retraction_method::TRM - stepsize_reduction::Float64 - stop_decreasing_at_step::Int - stop_increasing_at_step::Int - stop_when_stepsize_exceeds::Float64 - stop_when_stepsize_less::Float64 + stepsize_reduction::R + stop_decreasing_at_step::I + stop_increasing_at_step::I + stop_when_stepsize_exceeds::R + stop_when_stepsize_less::R storage::TSSA strategy::Symbol - sufficient_decrease::Float64 + sufficient_decrease::R vector_transport_method::VTM - function NonmonotoneLinesearch( - M::AbstractManifold=DefaultManifold(); - bb_min_stepsize::Float64=1e-3, - bb_max_stepsize::Float64=1e3, - candidate_point::P=allocate_result(M, rand), - initial_stepsize::Float64=1.0, - memory_size::Int=10, + function NonmonotoneLinesearchStepsize( + M::AbstractManifold; + bb_min_stepsize::R=1e-3, + bb_max_stepsize::R=1e3, + p::P=allocate_result(M, rand), + initial_stepsize::R=1.0, + memory_size::I=10, retraction_method::TRM=default_retraction_method(M), - stepsize_reduction::Float64=0.5, - stop_when_stepsize_less::Float64=0.0, - stop_when_stepsize_exceeds::Float64=max_stepsize(M), - stop_increasing_at_step::Int=100, - stop_decreasing_at_step::Int=1000, + stepsize_reduction::R=0.5, + stop_when_stepsize_less::R=0.0, + stop_when_stepsize_exceeds::R=max_stepsize(M), + stop_increasing_at_step::I=100, + stop_decreasing_at_step::I=1000, storage::Union{Nothing,StoreStateAction}=StoreStateAction( M; store_fields=[:Iterate, :Gradient] ), strategy::Symbol=:direct, - sufficient_decrease::Float64=1e-4, + sufficient_decrease::R=1e-4, vector_transport_method::VTM=default_vector_transport_method(M), - ) where {TRM,VTM,P} + ) where {TRM,VTM,P,R<:Real,I<:Integer} if strategy ∉ [:direct, :inverse, :alternating] @warn string( "The strategy '", @@ -696,13 +853,14 @@ mutable struct NonmonotoneLinesearch{ if memory_size <= 0 throw(DomainError(memory_size, "The memory_size has to be greater than zero.")) end - return new{TRM,VTM,Vector{Float64},typeof(storage),P}( + old_costs = zeros(memory_size) + return new{P,typeof(old_costs),R,I,TRM,VTM,typeof(storage)}( bb_min_stepsize, bb_max_stepsize, - candidate_point, + p, initial_stepsize, "", - zeros(memory_size), + old_costs, retraction_method, stepsize_reduction, stop_decreasing_at_step, @@ -716,7 +874,7 @@ mutable struct NonmonotoneLinesearch{ ) end end -function (a::NonmonotoneLinesearch)( +function (a::NonmonotoneLinesearchStepsize)( mp::AbstractManoptProblem, s::AbstractManoptSolverState, k::Int, @@ -744,7 +902,7 @@ function (a::NonmonotoneLinesearch)( k, ) end -function (a::NonmonotoneLinesearch)( +function (a::NonmonotoneLinesearchStepsize)( M::mT, p, f::TF, X::T, η::T, old_p, old_X, iter::Int; kwargs... ) where {mT<:AbstractManifold,TF,T} #find the difference between the current and previous gradient after the previous gradient is transported to the current tangent space @@ -822,45 +980,111 @@ function (a::NonmonotoneLinesearch)( ) return a.initial_stepsize end -function show(io::IO, a::NonmonotoneLinesearch) +function show(io::IO, a::NonmonotoneLinesearchStepsize) return print( io, """ - NonmonotoneLinesearch() with keyword arguments - * initial_stepsize = $(a.initial_stepsize) - * bb_max_stepsize = $(a.bb_max_stepsize) - * bb_min_stepsize = $(a.bb_min_stepsize), - * memory_size = $(length(a.old_costs)) - * stepsize_reduction = $(a.stepsize_reduction) - * strategy = :$(a.strategy) - * sufficient_decrease = $(a.sufficient_decrease) - * retraction_method = $(a.retraction_method) - * vector_transport_method = $(a.vector_transport_method)""", + NonmonotoneLinesearch(; + initial_stepsize = $(a.initial_stepsize) + bb_max_stepsize = $(a.bb_max_stepsize) + bb_min_stepsize = $(a.bb_min_stepsize), + memory_size = $(length(a.old_costs)) + stepsize_reduction = $(a.stepsize_reduction) + strategy = :$(a.strategy) + sufficient_decrease = $(a.sufficient_decrease) + retraction_method = $(a.retraction_method) + vector_transport_method = $(a.vector_transport_method + )""", ) end -get_message(a::NonmonotoneLinesearch) = a.message +get_message(a::NonmonotoneLinesearchStepsize) = a.message -@doc raw""" - PolyakStepsize <: Stepsize +@doc """ + NonmonotoneLinesearch(; kwargs...) + NonmonotoneLinesearch(M::AbstractManifold; kwargs...) +A functor representing a nonmonotone line search using the Barzilai-Borwein step size [IannazzoPorcelli:2017](@cite). -Compute a step size +This method first computes -````math -α_i = \frac{f(p^{(i)}) - f_{\text{best}}+γ_i}{\lVert{ ∂f(p^{(i)})} \rVert^2}, -```` +(x -> p, F-> f) +```math +y_{k} = $(_tex(:grad))f(p_{k}) - $(_math(:vector_transport, :symbol, "p_k", "p_{k-1}"))$(_tex(:grad))f(p_{k-1}) +``` -where ``f_{\text{best}}`` is the best cost value seen until the current iteration, -and ``γ_i`` is a sequence of numbers that is square summable, but not summable (the sum must diverge to infinity). -Furthermore ``∂f`` denotes a subgradient of ``f`` at the current iterate ``p^{(i)}``. +and +```math +s_{k} = - α_{k-1} ⋅ $(_math(:vector_transport, :symbol, "p_k", "p_{k-1}"))$(_tex(:grad))f(p_{k-1}), +``` + +where ``α_{k-1}`` is the step size computed in the last iteration and ``$(_math(:vector_transport, :symbol))`` is a vector transport. +Then the Barzilai—Borwein step size is -The step size is an adaption from the Dynamic step size discussed in Section 3.2 of [Bertsekas:2015](@cite), -both to the Riemannian case and to approximate the minimum cost value. +```math +α_k^{$(_tex(:text, "BB"))} = $(_tex(:cases, +"$(_tex(:min))(α_{$(_tex(:text, "max"))}, $(_tex(:max))(α_{$(_tex(:text, "min"))}, τ_{k})), & $(_tex(:text, "if")) ⟨s_{k}, y_{k}⟩_{p_k} > 0,", +"α_{$(_tex(:text, "max"))}, & $(_tex(:text, "else,"))" +)) +``` + +where + +```math +τ_{k} = $(_tex(:frac, "⟨s_{k}, s_{k}⟩_{p_k}", "⟨s_{k}, y_{k}⟩_{p_k}")), +``` + +if the direct strategy is chosen, or + +```math +τ_{k} = $(_tex(:frac, "⟨s_{k}, y_{k}⟩_{p_k}", "⟨y_{k}, y_{k}⟩_{p_k}")), +``` + +in case of the inverse strategy or an alternation between the two in cases for +the alternating strategy. Then find the smallest ``h = 0, 1, 2, …`` such that + +```math +f($(_tex(:retr))_{p_k}(- σ^h α_k^{$(_tex(:text, "BB"))} $(_tex(:grad))f(p_k))) ≤ +$(_tex(:max))_{1 ≤ j ≤ $(_tex(:max))(k+1,m)} f(p_{k+1-j}) - γ σ^h α_k^{$(_tex(:text, "BB"))} ⟨$(_tex(:grad))F(p_k), $(_tex(:grad))F(p_k)⟩_{p_k}, +``` + +where ``σ ∈ (0,1)`` is a step length reduction factor , ``m`` is the number of iterations +after which the function value has to be lower than the current one +and ``γ ∈ (0,1)`` is the sufficient decrease parameter. Finally the step size is computed as + +```math +α_k = σ^h α_k^{$(_tex(:text, "BB"))}. +``` + +# Keyword arguments + +$(_var(:Keyword, :p; add="to store an interim result")) +* `p=allocate_result(M, rand)`: to store an interim result +* `initial_stepsize=1.0`: the step size to start the search with +* `memory_size=10`: number of iterations after which the cost value needs to be lower than the current one +* `bb_min_stepsize=1e-3`: lower bound for the Barzilai-Borwein step size greater than zero +* `bb_max_stepsize=1e3`: upper bound for the Barzilai-Borwein step size greater than min_stepsize +$(_var(:Keyword, :retraction_method)) +* `strategy=direct`: defines if the new step size is computed using the `:direct`, `:indirect` or `:alternating` strategy +* `storage=`[`StoreStateAction`](@ref)`(M; store_fields=[:Iterate, :Gradient])`: increase efficiency by using a [`StoreStateAction`](@ref) for `:Iterate` and `:Gradient`. +* `stepsize_reduction=0.5`: step size reduction factor contained in the interval ``(0,1)`` +* `sufficient_decrease=1e-4`: sufficient decrease parameter contained in the interval ``(0,1)`` +* `stop_when_stepsize_less=0.0`: smallest stepsize when to stop (the last one before is taken) +* `stop_when_stepsize_exceeds=`[`max_stepsize`](@ref)`(M, p)`): largest stepsize when to stop to avoid leaving the injectivity radius +* `stop_increasing_at_step=100`: last step to increase the stepsize (phase 1), +* `stop_decreasing_at_step=1000`: last step size to decrease the stepsize (phase 2), +""" +NonmonotoneLinesearch(args...; kwargs...) = + ManifoldDefaultsFactory(NonmonotoneLinesearchStepsize, args...; kwargs...) + +@doc """ + PolyakStepsize <: Stepsize + +A functor `(problem, state, ...) -> s` to provide a step size due to Polyak, cf. Section 3.2 of [Bertsekas:2015](@cite). # Fields -* `γ` : a function `i -> ...` representing the sequence. -* `best_cost_value` : storing the value `f_{\text{best}}` +* `γ` : a function `k -> ...` representing a seuqnce. +* `best_cost_value` : storing the best cost value # Constructor @@ -869,7 +1093,10 @@ both to the Riemannian case and to approximate the minimum cost value. initial_cost_estimate=0.0 ) -initialize the Polyak stepsize to a certain sequence and an initial estimate of ``f_{\text{best}}`` +Construct a stepsize of Polyak type. + +# See also +[`Polyak`](@ref) """ mutable struct PolyakStepsize{F,R} <: Stepsize γ::F @@ -893,84 +1120,108 @@ end function show(io::IO, ps::PolyakStepsize) return print( io, - "PolyakStepsize() with keyword parameters\n * initial_cost_estimate = $(ps.best_cost_value)", + """ + Polyak() + A stepsize with keyword parameters + * initial_cost_estimate = $(ps.best_cost_value) + """, ) end +""" + Polyak(; kwargs...) + Polyak(M::AbstractManifold; kwargs...) -@doc raw""" - WolfePowellLinesearch <: Linesearch +Compute a step size according to a method propsed by Polyak, cf. the Dynamic step size +discussed in Section 3.2 of [Bertsekas:2015](@cite). +This has been generalised here to both the Riemannian case and to approximate the minimum cost value. -Do a backtracking line search to find a step size ``α`` that fulfils the -Wolfe conditions along a search direction ``η`` starting from ``x`` by +Let ``f_{$(_tex(:text, "best"))`` be the best cost value seen until now during some iterative +optimisation algorithm and let ``γ_k`` be a sequence of numbers that is square summable, but not summable. + +Then the step size computed here reads ```math -f\bigl( \operatorname{retr}_x(αη) \bigr) ≤ f(x_k) + c_1 α_k ⟨\operatorname{grad}f(x), η⟩_x -\quad\text{and}\quad -\frac{\mathrm{d}}{\mathrm{d}t} f\bigr(\operatorname{retr}_x(tη)\bigr) -\Big\vert_{t=α} -≥ c_2 \frac{\mathrm{d}}{\mathrm{d}t} f\bigl(\operatorname{retr}_x(tη)\bigr)\Big\vert_{t=0}. +s_k = $(_tex(:frac, "f(p^{(k)}) - f_{$(_tex(:text, "best")) + γ_k", _tex(:norm, "∂f(p^{(k)})}") )), ``` -# Constructors +where ``∂f`` denotes a nonzero-subgradient of ``f`` at the current iterate ``p^{(k)}``. -There exist two constructors, where, when provided the manifold `M` as a first (optional) -parameter, its default retraction and vector transport are the default. -In this case the retraction and the vector transport are also keyword arguments for ease of use. -The other constructor is kept for backward compatibility. -Note that the `stop_when_stepsize_less` to stop for too small stepsizes is only available in the -new signature including `M`. - WolfePowellLinesearch(M, c1::Float64=10^(-4), c2::Float64=0.999; kwargs... +# Constructor -Generate a Wolfe-Powell line search + Polyak(; γ = k -> 1/k, initial_cost_estimate=0.0) -## Keyword arguments +initialize the Polyak stepsize to a certain sequence and an initial estimate of ``f_{\text{best}}``. -* `candidate_point=allocate_result(M, rand)`: memory for a candidate -* `candidate_tangent=allocate_result(M, zero_vector, candidate_point)`: memory for a gradient -* `candidate_direcntion=allocate_result(M, zero_vector, candidate_point)`: memory for a direction -* `max_stepsize`: ([`max_stepsize`](@ref)`(M, p)`) largest stepsize allowed here. -* `retraction_method=ExponentialRetraction()`: the retraction to use +$(_note(:ManifoldDefaultFactory, "PolyakStepsize")) +""" +function Polyak(args...; kwargs...) + return ManifoldDefaultsFactory( + Manopt.PolyakStepsize, args...; requires_manifold=false, kwargs... + ) +end + +@doc """ + WolfePowellLinesearchStepsize{R<:Real} <: Linesearch + +Do a backtracking line search to find a step size ``α`` that fulfils the +Wolfe conditions along a search direction ``X`` starting from ``p``. +See [`WolfePowellLinesearch`](@ref) for the math details + +# Fields + +* `sufficient_decrease::R`, `sufficient_curvature::R` two constants in the line search +$(_var(:Field, :X, "candidate_direction")) +$(_var(:Field, :p, "candidate_point"; add="as temporary storage for candidates")) +$(_var(:Field, :X, "candidate_tangent")) +* `last_stepsize::R` +* `max_stepsize::R` +$(_var(:Field, :retraction_method)) +* `stop_when_stepsize_less::R`: a safeguard to stop when the stepsize gets too small +$(_var(:Field, :vector_transport_method)) + +# Keyword arguments + +* `sufficient_decrease=10^(-4)` +* `sufficient_curvature=0.999` +$(_var(:Field, :p; add="as temporary storage for candidates")) +$(_var(:Field, :X; add="as type of memory allocated for the candidates direction and tangent")) +* `max_stepsize=`[`max_stepsize`](@ref)`(M, p)`: largest stepsize allowed here. +$(_var(:Keyword, :retraction_method)) * `stop_when_stepsize_less=0.0`: smallest stepsize when to stop (the last one before is taken) -* `vector_transport_method=ParallelTransport()`: the vector transport method to use +$(_var(:Keyword, :vector_transport_method)) """ -mutable struct WolfePowellLinesearch{ - TRM<:AbstractRetractionMethod,VTM<:AbstractVectorTransportMethod,P,T +mutable struct WolfePowellLinesearchStepsize{ + R<:Real,TRM<:AbstractRetractionMethod,VTM<:AbstractVectorTransportMethod,P,T } <: Linesearch - c1::Float64 - c2::Float64 + sufficient_decrease::R + sufficient_curvature::R candidate_direction::T candidate_point::P candidate_tangent::T - last_stepsize::Float64 - max_stepsize::Float64 + last_stepsize::R + max_stepsize::R retraction_method::TRM - stop_when_stepsize_less::Float64 + stop_when_stepsize_less::R vector_transport_method::VTM - function WolfePowellLinesearch( - M::AbstractManifold=DefaultManifold(), - c1::Float64=10^(-4), - c2::Float64=0.999; - candidate_point::P=allocate_result(M, rand), - candidate_tangent::T=allocate_result(M, zero_vector, candidate_point), - candidate_direction::T=allocate_result(M, zero_vector, candidate_point), + function WolfePowellLinesearchStepsize( + M::AbstractManifold; + p::P=allocate_result(M, rand), + X::T=zero_vector(M, p), max_stepsize::Real=max_stepsize(M), retraction_method::TRM=default_retraction_method(M), + sufficient_decrease::R=1e-4, + sufficient_curvature::R=0.999, vector_transport_method::VTM=default_vector_transport_method(M), - linesearch_stopsize::Float64=0.0, # deprecated remove on next breaking change - stop_when_stepsize_less::Float64=linesearch_stopsize, # - ) where {TRM,VTM,P,T} - (linesearch_stopsize > 0.0) && Base.depwarn( - WolfePowellLinesearch, - "`linesearch_backtrack` is deprecated – use `stop_when_stepsize_less` instead´.", - ) - return new{TRM,VTM,P,T}( - c1, - c2, - candidate_direction, - candidate_point, - candidate_tangent, + stop_when_stepsize_less::R=0.0, + ) where {TRM,VTM,P,T,R} + return new{R,TRM,VTM,P,T}( + sufficient_decrease, + sufficient_curvature, + X, + p, + copy(M, p, X), 0.0, max_stepsize, retraction_method, @@ -979,7 +1230,7 @@ mutable struct WolfePowellLinesearch{ ) end end -function (a::WolfePowellLinesearch)( +function (a::WolfePowellLinesearchStepsize)( mp::AbstractManoptProblem, ams::AbstractManoptSolverState, ::Int, @@ -1005,8 +1256,8 @@ function (a::WolfePowellLinesearch)( vector_transport_to!( M, a.candidate_direction, p, η, a.candidate_point, a.vector_transport_method ) - if fNew > f0 + a.c1 * step * l - while (fNew > f0 + a.c1 * step * l) && (s_minus > 10^(-9)) # decrease + if fNew > f0 + a.sufficient_decrease * step * l + while (fNew > f0 + a.sufficient_decrease * step * l) && (s_minus > 10^(-9)) # decrease s_minus = s_minus * 0.5 step = s_minus retract!(M, a.candidate_point, p, η, step, a.retraction_method) @@ -1019,8 +1270,9 @@ function (a::WolfePowellLinesearch)( ) get_gradient!(mp, a.candidate_tangent, a.candidate_point) if real(inner(M, a.candidate_point, a.candidate_tangent, a.candidate_direction)) < - a.c2 * l - while fNew <= f0 + a.c1 * step * l && (s_plus < max_step_increase)# increase + a.sufficient_curvature * l + while fNew <= f0 + a.sufficient_decrease * step * l && + (s_plus < max_step_increase)# increase s_plus = s_plus * 2.0 step = s_plus retract!(M, a.candidate_point, p, η, step, a.retraction_method) @@ -1035,11 +1287,11 @@ function (a::WolfePowellLinesearch)( ) get_gradient!(mp, a.candidate_tangent, a.candidate_point) while real(inner(M, a.candidate_point, a.candidate_tangent, a.candidate_direction)) < - a.c2 * l + a.sufficient_curvature * l step = (s_minus + s_plus) / 2 retract!(M, a.candidate_point, p, η, step, a.retraction_method) fNew = get_cost(mp, a.candidate_point) - if fNew <= f0 + a.c1 * step * l + if fNew <= f0 + a.sufficient_decrease * step * l s_minus = step else s_plus = step @@ -1055,83 +1307,113 @@ function (a::WolfePowellLinesearch)( a.last_stepsize = step return step end -function show(io::IO, a::WolfePowellLinesearch) +function show(io::IO, a::WolfePowellLinesearchStepsize) return print( io, """ - WolfePowellLinesearch(DefaultManifold(), $(a.c1), $(a.c2)) with keyword arguments - * retraction_method = $(a.retraction_method) - * vector_transport_method = $(a.vector_transport_method)""", + WolfePowellLinesearch(; + sufficient_descrease=$(a.sufficient_decrease) + sufficient_curvature=$(a.sufficient_curvature), + retraction_method = $(a.retraction_method) + vector_transport_method = $(a.vector_transport_method) + stop_when_stepsize_less = $(a.stop_when_stepsize_less) + )""", ) end -function status_summary(a::WolfePowellLinesearch) +function status_summary(a::WolfePowellLinesearchStepsize) s = (a.last_stepsize > 0) ? "\nand the last stepsize used was $(a.last_stepsize)." : "" return "$a$s" end +""" + WolfePowellLinesearch(; kwargs...) + WolfePowellLinesearch(M::AbstractManifold; kwargs...) -_doc_WPBL_algorithm = raw"""Then with +Perform a lineseach to fulfull both the Armijo-Goldstein conditions ```math -A(t) = f(x_+) ≤ c1 t ⟨\operatorname{grad}f(x), η⟩_{x} -\quad\text{and}\quad -W(t) = ⟨\operatorname{grad}f(x_+), \text{V}_{x_+\gets x}η⟩_{x_+} ≥ c_2 ⟨η, \operatorname{grad}f(x)⟩_x, +f$(_tex(:bigl))( $(_tex(:retr))_{p}(αX) $(_tex(:bigr))) ≤ f(p) + c_1 α_k ⟨$(_tex(:grad)) f(p), X⟩_{p} ``` -where ``x_+ = \operatorname{retr}_x(tη)`` is the current trial point, and ``\text{V}`` is a -vector transport. -Then the following Algorithm is performed similar to Algorithm 7 from [Huang:2014](@cite) +as well as the Wolfe conditions -1. set ``α=0``, ``β=∞`` and ``t=1``. -2. While either ``A(t)`` does not hold or ``W(t)`` does not hold do steps 3-5. -3. If ``A(t)`` fails, set ``β=t``. -4. If ``A(t)`` holds but ``W(t)`` fails, set ``α=t``. -5. If ``β<∞`` set ``t=\frac{α+β}{2}``, otherwise set ``t=2α``. +```math +$(_tex(:deriv)) f$(_tex(:bigl))($(_tex(:retr))_{p}(tX)$(_tex(:bigr))) +$(_tex(:Big))$(_tex(:vert))_{t=α} +≥ c_2 $(_tex(:deriv)) f$(_tex(:bigl))($(_tex(:retr))_{p}(tX)$(_tex(:bigr)))$(_tex(:Big))$(_tex(:vert))_{t=0}. +``` + +for some given sufficient decrease coefficient ``c_1`` and some sufficient curvature condition coefficient``c_2``. + +This is adopted from [NocedalWright:2006; Section 3.1](@cite) + +# Keyword arguments + +* `sufficient_decrease=10^(-4)` +* `sufficient_curvature=0.999` +$(_var(:Field, :p; add="as temporary storage for candidates")) +$(_var(:Field, :X; add="as type of memory allocated for the candidates direction and tangent")) +* `max_stepsize=`[`max_stepsize`](@ref)`(M, p)`: largest stepsize allowed here. +$(_var(:Keyword, :retraction_method)) +* `stop_when_stepsize_less=0.0`: smallest stepsize when to stop (the last one before is taken) +$(_var(:Keyword, :vector_transport_method)) """ +WolfePowellLinesearch(args...; kwargs...) = + ManifoldDefaultsFactory(WolfePowellLinesearchStepsize, args...; kwargs...) @doc """ - WolfePowellBinaryLinesearch <: Linesearch - -A [`Linesearch`](@ref) method that determines a step size `t` fulfilling the Wolfe conditions + WolfePowellBinaryLinesearchStepsize{R} <: Linesearch -based on a binary chop. Let ``η`` be a search direction and ``c1,c_2>0`` be two constants. +Do a backtracking line search to find a step size ``α`` that fulfils the +Wolfe conditions along a search direction ``X`` starting from ``p``. +See [`WolfePowellBinaryLinesearch`](@ref) for the math details. -$_doc_WPBL_algorithm +# Fields -# Constructors +* `sufficient_decrease::R`, `sufficient_curvature::R` two constants in the line search +* `last_stepsize::R` +* `max_stepsize::R` +$(_var(:Field, :retraction_method)) +* `stop_when_stepsize_less::R`: a safeguard to stop when the stepsize gets too small +$(_var(:Field, :vector_transport_method)) - WolfePowellLinesearch(M=DefaultManifold(), c1=10^(-4), c2=0.999; kwargs...) +# Keyword arguments -## Keyword arguments +* `sufficient_decrease=10^(-4)` +* `sufficient_curvature=0.999` +* `max_stepsize=`[`max_stepsize`](@ref)`(M, p)`: largest stepsize allowed here. +$(_var(:Keyword, :retraction_method)) +* `stop_when_stepsize_less=0.0`: smallest stepsize when to stop (the last one before is taken) +$(_var(:Keyword, :vector_transport_method)) -* `linesearch_stopsize = 0.0`: a numerical barrier when to stop due to underflow -* $_kw_retraction_method_default: - $_kw_retraction_method -* $_kw_vector_transport_method_default: - $_kw_vector_transport_method """ -mutable struct WolfePowellBinaryLinesearch{ +mutable struct WolfePowellBinaryLinesearchStepsize{ TRM<:AbstractRetractionMethod,VTM<:AbstractVectorTransportMethod,F } <: Linesearch retraction_method::TRM vector_transport_method::VTM - c1::F - c2::F + sufficient_decrease::F + sufficient_curvature::F last_stepsize::F - linesearch_stopsize::F + stop_when_stepsize_less::F - function WolfePowellBinaryLinesearch( - M::AbstractManifold=DefaultManifold(), - c1::F=10^(-4), - c2::F=0.999; + function WolfePowellBinaryLinesearchStepsize( + M::AbstractManifold=DefaultManifold(); + sufficient_decrease::F=10^(-4), + sufficient_curvature::F=0.999, retraction_method::RTM=default_retraction_method(M), vector_transport_method::VTM=default_vector_transport_method(M), - linesearch_stopsize::F=0.0, - ) where {F,RTM<:AbstractRetractionMethod,VTM<:AbstractVectorTransportMethod} + stop_when_stepsize_less::F=0.0, + ) where {VTM<:AbstractVectorTransportMethod,RTM<:AbstractRetractionMethod,F} return new{RTM,VTM,F}( - retraction_method, vector_transport_method, c1, c2, 0.0, linesearch_stopsize + retraction_method, + vector_transport_method, + sufficient_decrease, + sufficient_curvature, + 0.0, + stop_when_stepsize_less, ) end end -function (a::WolfePowellBinaryLinesearch)( +function (a::WolfePowellBinaryLinesearchStepsize)( amp::AbstractManoptProblem, ams::AbstractManoptSolverState, ::Int, @@ -1147,13 +1429,16 @@ function (a::WolfePowellBinaryLinesearch)( fNew = get_cost(amp, xNew) η_xNew = vector_transport_to(M, get_iterate(ams), η, xNew, a.vector_transport_method) gradient_new = get_gradient(amp, xNew) - nAt = fNew > f0 + a.c1 * t * real(inner(M, get_iterate(ams), η, get_gradient(ams))) + nAt = + fNew > + f0 + + a.sufficient_decrease * t * real(inner(M, get_iterate(ams), η, get_gradient(ams))) nWt = real(inner(M, xNew, gradient_new, η_xNew)) < - a.c2 * real(inner(M, get_iterate(ams), η, get_gradient(ams))) + a.sufficient_curvature * real(inner(M, get_iterate(ams), η, get_gradient(ams))) while (nAt || nWt) && - (t > a.linesearch_stopsize) && - ((α + β) / 2 - 1 > a.linesearch_stopsize) + (t > a.stop_when_stepsize_less) && + ((α + β) / 2 - 1 > a.stop_when_stepsize_less) nAt && (β = t) # A(t) fails (!nAt && nWt) && (α = t) # A(t) holds but W(t) fails t = isinf(β) ? 2 * α : (α + β) / 2 @@ -1165,186 +1450,76 @@ function (a::WolfePowellBinaryLinesearch)( M, η_xNew, get_iterate(ams), η, xNew, a.vector_transport_method ) # Update conditions - nAt = fNew > f0 + a.c1 * t * real(inner(M, get_iterate(ams), η, get_gradient(ams))) + nAt = + fNew > + f0 + + a.sufficient_decrease * + t * + real(inner(M, get_iterate(ams), η, get_gradient(ams))) nWt = real(inner(M, xNew, gradient_new, η_xNew)) < - a.c2 * real(inner(M, get_iterate(ams), η, get_gradient(ams))) + a.sufficient_curvature * real(inner(M, get_iterate(ams), η, get_gradient(ams))) end a.last_stepsize = t return t end -function show(io::IO, a::WolfePowellBinaryLinesearch) +function show(io::IO, a::WolfePowellBinaryLinesearchStepsize) return print( io, """ - WolfePowellBinaryLinesearch(DefaultManifold(), $(a.c1), $(a.c2)) with keyword arguments - * retraction_method = $(a.retraction_method) - * vector_transport_method = $(a.vector_transport_method) - * linesearch_stopsize = $(a.linesearch_stopsize)""", + WolfePowellBinaryLinesearch(; + sufficient_descrease=$(a.sufficient_decrease) + sufficient_curvature=$(a.sufficient_curvature), + retraction_method = $(a.retraction_method) + vector_transport_method = $(a.vector_transport_method) + stop_when_stepsize_less = $(a.stop_when_stepsize_less) + )""", ) end -function status_summary(a::WolfePowellBinaryLinesearch) +function status_summary(a::WolfePowellBinaryLinesearchStepsize) s = (a.last_stepsize > 0) ? "\nand the last stepsize used was $(a.last_stepsize)." : "" return "$a$s" end -@doc raw""" - AdaptiveWNGradient <: DirectionUpdateRule - -Represent an adaptive gradient method introduced by [GrapigliaStella:2023](@cite). - -Given a positive threshold ``\hat c \mathbb N``, -an minimal bound ``b_{\mathrm{min}} > 0``, -an initial ``b_0 ≥ b_{\mathrm{min}}``, and a -gradient reduction factor threshold ``\alpha ∈ [0,1)``. - -Set ``c_0=0`` and use ``\omega_0 = \lVert \operatorname{grad} f(p_0) \rvert_{p_0}``. - -For the first iterate use the initial step size ``s_0 = \frac{1}{b_0}``. - -Then, given the last gradient ``X_{k-1} = \operatorname{grad} f(x_{k-1})``, -and a previous ``\omega_{k-1}``, the values ``(b_k, \omega_k, c_k)`` are computed -using ``X_k = \operatorname{grad} f(p_k)`` and the following cases - -If ``\lVert X_k \rVert_{p_k} \leq \alpha\omega_{k-1}``, then let -``\hat b_{k-1} ∈ [b_\mathrm{min},b_{k-1}]`` and set - +_doc_WPBL_algorithm = """With ```math -(b_k, \omega_k, c_k) = \begin{cases} -\bigl(\hat b_{k-1}, \lVert X_k\rVert_{p_k}, 0 \bigr) & \text{ if } c_{k-1}+1 = \hat c\\ -\Bigl(b_{k-1} + \frac{\lVert X_k\rVert_{p_k}^2}{b_{k-1}}, \omega_{k-1}, c_{k-1}+1 \Bigr) & \text{ if } c_{k-1}+1<\hat c -\end{cases} +A(t) = f(p_+) ≤ c_1 t ⟨$(_tex(:grad))f(p), X⟩_{x} +$(_tex(:quad))$(_tex(:text, " and "))$(_tex(:quad)) +W(t) = ⟨$(_tex(:grad))f(x_+), $(_math(:vector_transport, :symbol, "p_+", "p"))X⟩_{p_+} ≥ c_2 ⟨X, $(_tex(:grad))f(x)⟩_x, ``` -If ``\lVert X_k \rVert_{p_k} > \alpha\omega_{k-1}``, the set - -```math -(b_k, \omega_k, c_k) = -\Bigl( b_{k-1} + \frac{\lVert X_k\rVert_{p_k}^2}{b_{k-1}}, \omega_{k-1}, 0) -``` - -and return the step size ``s_k = \frac{1}{b_k}``. - -Note that for ``α=0`` this is the Riemannian variant of `WNGRad`. - -# Fields - -* `count_threshold::Int=4`: an `Integer` for ``\hat c`` -* `minimal_bound::Float64=1e-4`: for ``b_{\mathrm{min}}`` -* `alternate_bound::Function=(bk, hat_c) -> min(gradient_bound, max(gradient_bound, bk/(3*hat_c)`: - how to determine ``\hat b_k`` as a function of `(bmin, bk, hat_c) -> hat_bk` -* `gradient_reduction::Float64=0.9`: -* `gradient_bound` `norm(M, p0, grad_f(M,p0))` the bound ``b_k``. +where ``p_+ =$(_tex(:retr))_p(tX)`` is the current trial point, and ``$(_math(:vector_transport, :symbol))`` denotes a +vector transport. +Then the following Algorithm is performed similar to Algorithm 7 from [Huang:2014](@cite) -as well as the internal fields +1. set ``α=0``, ``β=∞`` and ``t=1``. +2. While either ``A(t)`` does not hold or ``W(t)`` does not hold do steps 3-5. +3. If ``A(t)`` fails, set ``β=t``. +4. If ``A(t)`` holds but ``W(t)`` fails, set ``α=t``. +5. If ``β<∞`` set ``t=$(_tex(:frac, "α+β","2"))``, otherwise set ``t=2α``. +""" -* `weight` for ``ω_k`` initialised to ``ω_0 = ```norm(M, p0, grad_f(M,p0))` if this is not zero, `1.0` otherwise. -* `count` for the ``c_k``, initialised to ``c_0 = 0``. +""" + WolfePowellBinaryLinesearch(; kwargs...) + WolfePowellBinaryLinesearch(M::AbstractManifold; kwargs...) -# Constructor +Perform a lineseach to fulfull both the Armijo-Goldstein conditions +for some given sufficient decrease coefficient ``c_1`` and some sufficient curvature condition coefficient``c_2``. +Compared to [`WolfePowellLinesearch`](@ref Manopt.WolfePowellLinesearch) which tries a simpler method, this linesearch performs the following algorithm - AdaptiveWNGrad(M=DefaultManifold, grad_f=(M, p) -> zero_vector(M, rand(M)), p=rand(M); kwargs...) +$(_doc_WPBL_algorithm) -Where all fields with defaults are keyword arguments and additional keyword arguments are +# Keyword arguments -* `adaptive=true`: switches the `gradient_reduction ``α`` to `0`. -* `evaluation=AllocatingEvaluation()`: specifies whether the gradient (that is used for initialisation only) is mutating or allocating +* `sufficient_decrease=10^(-4)` +* `sufficient_curvature=0.999` +* `max_stepsize=`[`max_stepsize`](@ref)`(M, p)`: largest stepsize allowed here. +$(_var(:Keyword, :retraction_method)) +* `stop_when_stepsize_less=0.0`: smallest stepsize when to stop (the last one before is taken) +$(_var(:Keyword, :vector_transport_method)) """ -mutable struct AdaptiveWNGradient{I<:Integer,R<:Real,F<:Function} <: Stepsize - count_threshold::I - minimal_bound::R - alternate_bound::F - gradient_reduction::R - gradient_bound::R - weight::R - count::I -end -function AdaptiveWNGradient( - M::AbstractManifold=DefaultManifold(), - (grad_f!!)=(M, p) -> zero_vector(M, rand(M)), - p=rand(M); - evaluation::E=AllocatingEvaluation(), - adaptive::Bool=true, - count_threshold::I=4, - minimal_bound::R=1e-4, - gradient_reduction::R=adaptive ? 0.9 : 0.0, - gradient_bound::R=norm( - M, - p, - if evaluation == AllocatingEvaluation() - grad_f!!(M, p) - else - grad_f!!(M, zero_vector(M, p), p) - end, - ), - alternate_bound::F=(bk, hat_c) -> min( - gradient_bound == 0 ? 1.0 : gradient_bound, max(minimal_bound, bk / (3 * hat_c)) - ), - kwargs..., -) where {I<:Integer,R<:Real,F<:Function,E<:AbstractEvaluationType} - if gradient_bound == 0 - # If the gradient bound defaults to zero, set it to 1 - gradient_bound = 1.0 - end - return AdaptiveWNGradient{I,R,F}( - count_threshold, - minimal_bound, - alternate_bound, - gradient_reduction, - gradient_bound, - gradient_bound, - 0, - ) -end -function (awng::AdaptiveWNGradient)( - mp::AbstractManoptProblem, s::AbstractGradientSolverState, i, args...; kwargs... -) - M = get_manifold(mp) - p = get_iterate(s) - X = get_gradient(mp, p) - isnan(awng.weight) || (awng.weight = norm(M, p, X)) # init ω_0 - if i == 0 # init fields - awng.weight = norm(M, p, X) # init ω_0 - (awng.weight == 0) && (awng.weight = 1.0) - awng.count = 0 - return 1 / awng.gradient_bound - end - grad_norm = norm(M, p, X) - if grad_norm < awng.gradient_reduction * awng.weight # grad norm < αω_{k-1} - if awng.count + 1 == awng.count_threshold - awng.gradient_bound = awng.alternate_bound( - awng.gradient_bound, awng.count_threshold - ) - awng.weight = grad_norm - awng.count = 0 - else - awng.gradient_bound = awng.gradient_bound + grad_norm^2 / awng.gradient_bound - #weight stays unchanged - awng.count += 1 - end - else - awng.gradient_bound = awng.gradient_bound + grad_norm^2 / awng.gradient_bound - #weight stays unchanged - awng.count = 0 - end - return 1 / awng.gradient_bound -end -get_initial_stepsize(awng::AdaptiveWNGradient) = 1 / awng.gradient_bound -get_last_stepsize(awng::AdaptiveWNGradient) = 1 / awng.gradient_bound -function show(io::IO, awng::AdaptiveWNGradient) - s = """ - AdaptiveWNGradient(; - count_threshold=$(awng.count_threshold), - minimal_bound=$(awng.minimal_bound), - alternate_bound=$(awng.alternate_bound), - gradient_reduction=$(awng.gradient_reduction), - gradient_bound=$(awng.gradient_bound) - ) - - as well as internally the weight ω_k = $(awng.weight) and current count c_k = $(awng.count). - """ - return print(io, s) -end +WolfePowellBinaryLinesearch(args...; kwargs...) = + ManifoldDefaultsFactory(WolfePowellBinaryLinesearchStepsize, args...; kwargs...) @doc raw""" get_stepsize(amp::AbstractManoptProblem, ams::AbstractManoptSolverState, vars...) @@ -1435,12 +1610,12 @@ return the last computed stepsize from within the stepsize. If no last step size is stored, this returns `NaN`. """ get_last_stepsize(::Stepsize, ::Any...) = NaN -function get_last_stepsize(step::ArmijoLinesearch, ::Any...) +function get_last_stepsize(step::ArmijoLinesearchStepsize, ::Any...) return step.last_stepsize end -function get_last_stepsize(step::WolfePowellLinesearch, ::Any...) +function get_last_stepsize(step::WolfePowellLinesearchStepsize, ::Any...) return step.last_stepsize end -function get_last_stepsize(step::WolfePowellBinaryLinesearch, ::Any...) +function get_last_stepsize(step::WolfePowellBinaryLinesearchStepsize, ::Any...) return step.last_stepsize end diff --git a/src/plans/stochastic_gradient_plan.jl b/src/plans/stochastic_gradient_plan.jl index 2f720958ba..beb29d0430 100644 --- a/src/plans/stochastic_gradient_plan.jl +++ b/src/plans/stochastic_gradient_plan.jl @@ -351,6 +351,6 @@ end """ AbstractStochasticGradientDescentSolverState <: AbstractManoptSolverState -A generic type for all options related to stochastic gradient descent methods +A generic type for all options related to gradient descent methods working with parts of the total gradient """ -abstract type AbstractGradientGroupProcessor <: DirectionUpdateRule end +abstract type AbstractGradientGroupDirectionRule <: DirectionUpdateRule end diff --git a/src/plans/stopping_criterion.jl b/src/plans/stopping_criterion.jl index da25f7e669..ee0aed5977 100644 --- a/src/plans/stopping_criterion.jl +++ b/src/plans/stopping_criterion.jl @@ -119,11 +119,11 @@ function show(io::IO, c::StopAfter) end """ - update_stopping_criterion!(c::StopAfter, :MaxTime, v::Period) + set_parameter!(c::StopAfter, :MaxTime, v::Period) Update the time period after which an algorithm shall stop. """ -function update_stopping_criterion!(c::StopAfter, ::Val{:MaxTime}, v::Period) +function set_parameter!(c::StopAfter, ::Val{:MaxTime}, v::Period) (value(v) < 0) && error("You must provide a positive time period") c.threshold = v return c @@ -179,11 +179,11 @@ function show(io::IO, c::StopAfterIteration) end """ - update_stopping_criterion!(c::StopAfterIteration, :;MaxIteration, v::Int) + set_parameter!(c::StopAfterIteration, :;MaxIteration, v::Int) Update the number of iterations after which the algorithm should stop. """ -function update_stopping_criterion!(c::StopAfterIteration, ::Val{:MaxIteration}, v::Int) +function set_parameter!(c::StopAfterIteration, ::Val{:MaxIteration}, v::Int) c.max_iterations = v return c end @@ -228,18 +228,8 @@ function StopWhenChangeLess( ε, zero(ε), storage, inverse_retraction_method, -1 ) end -function StopWhenChangeLess( - ε::F; - storage::StoreStateAction=StoreStateAction([:Iterate]), - manifold::AbstractManifold=DefaultManifold(), - inverse_retraction_method::IRT=default_inverse_retraction_method(manifold), -) where {F,IRT<:AbstractInverseRetractionMethod} - if !(manifold isa DefaultManifold) - @warn "The `manifold` keyword is deprecated, use the first positional argument `M` instead." - end - return StopWhenChangeLess{F,IRT,typeof(storage)}( - ε, zero(ε), storage, inverse_retraction_method, -1 - ) +function StopWhenChangeLess(ε::R; kwargs...) where {R<:Real} + return StopWhenChangeLess(DefaultManifold(), ε; kwargs...) end function (c::StopWhenChangeLess)(mp::AbstractManoptProblem, s::AbstractManoptSolverState, k) if k == 0 # reset on init @@ -271,15 +261,17 @@ function status_summary(c::StopWhenChangeLess) end indicates_convergence(c::StopWhenChangeLess) = true function show(io::IO, c::StopWhenChangeLess) - return print(io, "StopWhenChangeLess($(c.threshold))\n $(status_summary(c))") + return print( + io, "StopWhenChangeLess with threshold $(c.threshold)\n $(status_summary(c))" + ) end """ - update_stopping_criterion!(c::StopWhenChangeLess, :MinIterateChange, v::Int) + set_parameter!(c::StopWhenChangeLess, :MinIterateChange, v::Int) Update the minimal change below which an algorithm shall stop. """ -function update_stopping_criterion!(c::StopWhenChangeLess, ::Val{:MinIterateChange}, v) +function set_parameter!(c::StopWhenChangeLess, ::Val{:MinIterateChange}, v) c.threshold = v return c end @@ -333,11 +325,11 @@ function show(io::IO, c::StopWhenCostLess) end """ - update_stopping_criterion!(c::StopWhenCostLess, :MinCost, v) + set_parameter!(c::StopWhenCostLess, :MinCost, v) Update the minimal cost below which the algorithm shall stop """ -function update_stopping_criterion!(c::StopWhenCostLess, ::Val{:MinCost}, v) +function set_parameter!(c::StopWhenCostLess, ::Val{:MinCost}, v) c.threshold = v return c end @@ -419,11 +411,11 @@ function status_summary(sc::StopWhenEntryChangeLess) end """ - update_stopping_criterion!(c::StopWhenEntryChangeLess, :Threshold, v) + set_parameter!(c::StopWhenEntryChangeLess, :Threshold, v) Update the minimal cost below which the algorithm shall stop """ -function update_stopping_criterion!(c::StopWhenEntryChangeLess, ::Val{:Threshold}, v) +function set_parameter!(c::StopWhenEntryChangeLess, ::Val{:Threshold}, v) c.threshold = v return c end @@ -522,13 +514,11 @@ function show(io::IO, c::StopWhenGradientChangeLess) end """ - update_stopping_criterion!(c::StopWhenGradientChangeLess, :MinGradientChange, v) + set_parameter!(c::StopWhenGradientChangeLess, :MinGradientChange, v) Update the minimal change below which an algorithm shall stop. """ -function update_stopping_criterion!( - c::StopWhenGradientChangeLess, ::Val{:MinGradientChange}, v -) +function set_parameter!(c::StopWhenGradientChangeLess, ::Val{:MinGradientChange}, v) c.threshold = v return c end @@ -599,13 +589,11 @@ function show(io::IO, c::StopWhenGradientNormLess) end """ - update_stopping_criterion!(c::StopWhenGradientNormLess, :MinGradNorm, v::Float64) + set_parameter!(c::StopWhenGradientNormLess, :MinGradNorm, v::Float64) Update the minimal gradient norm when an algorithm shall stop """ -function update_stopping_criterion!( - c::StopWhenGradientNormLess, ::Val{:MinGradNorm}, v::Float64 -) +function set_parameter!(c::StopWhenGradientNormLess, ::Val{:MinGradNorm}, v::Float64) c.threshold = v return c end @@ -658,11 +646,11 @@ function show(io::IO, c::StopWhenStepsizeLess) return print(io, "StopWhenStepsizeLess($(c.threshold))\n $(status_summary(c))") end """ - update_stopping_criterion!(c::StopWhenStepsizeLess, :MinStepsize, v) + set_parameter!(c::StopWhenStepsizeLess, :MinStepsize, v) Update the minimal step size below which the algorithm shall stop """ -function update_stopping_criterion!(c::StopWhenStepsizeLess, ::Val{:MinStepsize}, v) +function set_parameter!(c::StopWhenStepsizeLess, ::Val{:MinStepsize}, v) c.threshold = v return c end @@ -855,13 +843,11 @@ function show(io::IO, c::StopWhenSubgradientNormLess) ) end """ - update_stopping_criterion!(c::StopWhenSubgradientNormLess, :MinSubgradNorm, v::Float64) + set_parameter!(c::StopWhenSubgradientNormLess, :MinSubgradNorm, v::Float64) Update the minimal subgradient norm when an algorithm shall stop """ -function update_stopping_criterion!( - c::StopWhenSubgradientNormLess, ::Val{:MinSubgradNorm}, v::Float64 -) +function set_parameter!(c::StopWhenSubgradientNormLess, ::Val{:MinSubgradNorm}, v::Float64) c.threshold = v return c end @@ -871,7 +857,7 @@ end # @doc raw""" - StopWhenAll <: StoppingCriterion + StopWhenAll <: StoppingCriterionSet store an array of [`StoppingCriterion`](@ref) elements and indicates to stop, when _all_ indicate to stop. The `reason` is given by the concatenation of all @@ -931,13 +917,13 @@ If either `s1` (or `s2`) is already an [`StopWhenAll`](@ref), then `s2` (or `s1` appended to the list of [`StoppingCriterion`](@ref) within `s1` (or `s2`). # Example - a = StopAfterIteration(200) & StopWhenChangeLess(1e-6) + a = StopAfterIteration(200) & StopWhenChangeLess(M, 1e-6) b = a & StopWhenGradientNormLess(1e-6) Is the same as - a = StopWhenAll(StopAfterIteration(200), StopWhenChangeLess(1e-6)) - b = StopWhenAll(StopAfterIteration(200), StopWhenChangeLess(1e-6), StopWhenGradientNormLess(1e-6)) + a = StopWhenAll(StopAfterIteration(200), StopWhenChangeLess(M, 1e-6)) + b = StopWhenAll(StopAfterIteration(200), StopWhenChangeLess(M, 1e-6), StopWhenGradientNormLess(1e-6)) """ function Base.:&(s1::S, s2::T) where {S<:StoppingCriterion,T<:StoppingCriterion} return StopWhenAll(s1, s2) @@ -953,7 +939,7 @@ function Base.:&(s1::StopWhenAll, s2::StopWhenAll) end @doc raw""" - StopWhenAny <: StoppingCriterion + StopWhenAny <: StoppingCriterionSet store an array of [`StoppingCriterion`](@ref) elements and indicates to stop, when _any_ single one indicates to stop. The `reason` is given by the @@ -1026,13 +1012,13 @@ If either `s1` (or `s2`) is already an [`StopWhenAny`](@ref), then `s2` (or `s1` appended to the list of [`StoppingCriterion`](@ref) within `s1` (or `s2`) # Example - a = StopAfterIteration(200) | StopWhenChangeLess(1e-6) + a = StopAfterIteration(200) | StopWhenChangeLess(M, 1e-6) b = a | StopWhenGradientNormLess(1e-6) Is the same as - a = StopWhenAny(StopAfterIteration(200), StopWhenChangeLess(1e-6)) - b = StopWhenAny(StopAfterIteration(200), StopWhenChangeLess(1e-6), StopWhenGradientNormLess(1e-6)) + a = StopWhenAny(StopAfterIteration(200), StopWhenChangeLess(M, 1e-6)) + b = StopWhenAny(StopAfterIteration(200), StopWhenChangeLess(M, 1e-6), StopWhenGradientNormLess(1e-6)) """ function Base.:|(s1::S, s2::T) where {S<:StoppingCriterion,T<:StoppingCriterion} return StopWhenAny(s1, s2) @@ -1085,45 +1071,22 @@ end get_stopping_criteria(c::StopWhenAll) = c.criteria get_stopping_criteria(c::StopWhenAny) = c.criteria -@doc raw""" - update_stopping_criterion!(c::Stoppingcriterion, s::Symbol, v::value) - update_stopping_criterion!(s::AbstractManoptSolverState, symbol::Symbol, v::value) - update_stopping_criterion!(c::Stoppingcriterion, ::Val{Symbol}, v::value) - -Update a value within a stopping criterion, specified by the symbol `s`, to `v`. -If a criterion does not have a value assigned that corresponds to `s`, the update is ignored. - -For the second signature, the stopping criterion within the [`AbstractManoptSolverState`](@ref) `o` is updated. - -To see which symbol updates which value, see the specific stopping criteria. They should -use dispatch per symbol value (the third signature). -""" -update_stopping_criterion!(c, s, v) - -function update_stopping_criterion!(s::AbstractManoptSolverState, symbol::Symbol, v) - update_stopping_criterion!(s.stop, symbol, v) +function set_parameter!(s::AbstractManoptSolverState, ::Val{:StoppingCriterion}, args...) + set_parameter!(s.stop, args...) return s end -function update_stopping_criterion!(c::StopWhenAll, s::Symbol, v) +function set_parameter!(c::StopWhenAll, s::Symbol, v) for d in c.criteria - update_stopping_criterion!(d, s, v) + set_parameter!(d, s, v) end return c end -function update_stopping_criterion!(c::StopWhenAny, s::Symbol, v) +function set_parameter!(c::StopWhenAny, s::Symbol, v) for d in c.criteria - update_stopping_criterion!(d, s, v) + set_parameter!(d, s, v) end return c end -function update_stopping_criterion!(c::StoppingCriterion, s::Symbol, v::Any) - update_stopping_criterion!(c, Val(s), v) - return c -end -# fallback: do nothing -function update_stopping_criterion!(c::StoppingCriterion, ::Val, v) - return c -end @doc raw""" get_reason(s::AbstractManoptSolverState) diff --git a/src/plans/subsolver_plan.jl b/src/plans/subsolver_plan.jl index 2f5e6abaaa..9abbcd2fd4 100644 --- a/src/plans/subsolver_plan.jl +++ b/src/plans/subsolver_plan.jl @@ -106,40 +106,36 @@ By default this returns `ams.sub_state`. get_sub_state(ams::AbstractSubProblemSolverState) = ams.sub_state """ - set_manopt_parameter!(ams::AbstractManoptSolverState, element::Symbol, args...) + set_parameter!(ams::AbstractManoptSolverState, element::Symbol, args...) Set a certain field or semantic element from the [`AbstractManoptSolverState`](@ref) `ams` to `value`. This function passes to `Val(element)` and specific setters should dispatch on `Val{element}`. By default, this function just does nothing. """ -function set_manopt_parameter!(ams::AbstractManoptSolverState, e::Symbol, args...) - return set_manopt_parameter!(ams, Val(e), args...) +function set_parameter!(ams::AbstractManoptSolverState, e::Symbol, args...) + return set_parameter!(ams, Val(e), args...) end # Default: do nothing -function set_manopt_parameter!(ams::AbstractManoptSolverState, ::Val, args...) +function set_parameter!(ams::AbstractManoptSolverState, ::Val, args...) return ams end """ - set_manopt_parameter!(ams::DebugSolverState, ::Val{:SubProblem}, args...) + set_parameter!(ams::DebugSolverState, ::Val{:SubProblem}, args...) Set certain values specified by `args...` to the sub problem. """ -function set_manopt_parameter!( - ams::AbstractSubProblemSolverState, ::Val{:SubProblem}, args... -) - set_manopt_parameter!(get_sub_problem(ams), args...) +function set_parameter!(ams::AbstractSubProblemSolverState, ::Val{:SubProblem}, args...) + set_parameter!(get_sub_problem(ams), args...) return ams end """ - set_manopt_parameter!(ams::DebugSolverState, ::Val{:SubState}, args...) + set_parameter!(ams::DebugSolverState, ::Val{:SubState}, args...) Set certain values specified by `args...` to the sub state. """ -function set_manopt_parameter!( - ams::AbstractSubProblemSolverState, ::Val{:SubState}, args... -) - set_manopt_parameter!(get_sub_state(ams), args...) +function set_parameter!(ams::AbstractSubProblemSolverState, ::Val{:SubState}, args...) + set_parameter!(get_sub_state(ams), args...) return ams end diff --git a/src/solvers/ChambollePock.jl b/src/solvers/ChambollePock.jl index 7c405bfb35..cc4b5fe508 100644 --- a/src/solvers/ChambollePock.jl +++ b/src/solvers/ChambollePock.jl @@ -7,28 +7,26 @@ stores all options and variables within a linearized or exact Chambolle Pock. * `acceleration::R`: acceleration factor * `dual_stepsize::R`: proximal parameter of the dual prox -* $(_field_inv_retr) -* `inverse_retraction_method_dual::`[`AbstractInverseRetractionMethod`](@extref `ManifoldsBase.AbstractInverseRetractionMethod`): - an inverse retraction ``$(_l_retr)^{-1}`` on ``$(_l_Manifold("N"))`` -* `m::P`: base point on ``$(_l_M)`` -* `n::Q`: base point on ``$(_l_Manifold("N"))`` -* `p::P`: an initial point on ``p^{(0)} ∈ $(_l_M)`` +$(_var(:Field, :inverse_retraction_method)) +$(_var(:Field, :inverse_retraction_method, "inverse_retraction_method_dual"; M="N", p="n")) +* `m::P`: base point on ``$(_math(:M))`` +* `n::Q`: base point on ``$(_tex(:Cal, "N"))`` +* `p::P`: an initial point on ``p^{(0)} ∈ $(_math(:M))`` * `pbar::P`: the relaxed iterate used in the next dual update step (when using `:primal` relaxation) * `primal_stepsize::R`: proximal parameter of the primal prox -* `X::T`: an initial tangent vector ``X^{(0)} ∈ T_{p^{(0)}}$(_l_M)`` +* `X::T`: an initial tangent vector ``X^{(0)} ∈ T_{p^{(0)}}$(_math(:M))`` * `Xbar::T`: the relaxed iterate used in the next primal update step (when using `:dual` relaxation) * `relaxation::R`: relaxation in the primal relaxation step (to compute `pbar`: * `relax::Symbol: which variable to relax (`:primal` or `:dual`: -* $(_field_retr) -* `stop`: a [`StoppingCriterion`](@ref) +$(_var(:Field, :retraction_method)) +$(_var(:Field, :stopping_criterion, "stop")) * `variant`: whether to perform an `:exact` or `:linearized` Chambolle-Pock * `update_primal_base`: function `(pr, st, k) -> m` to update the primal base * `update_dual_base`: function `(pr, st, k) -> n` to update the dual base -* $(_field_vector_transp) -* `vector_transport_method_dual::`[`AbstractVectorTransportMethod`](@extref `ManifoldsBase.AbstractVectorTransportMethod`): - a vector transport ``$(_l_vt)``on ``$(_l_Manifold("N"))`` +$(_var(:Field, :vector_transport_method)) +$(_var(:Field, :vector_transport_method, "vector_transport_method_dual"; M="N")) -Here, `P` is a point type on ``$(_l_M)``, `T` its tangent vector type, `Q` a point type on ``$(_l_Manifold("N"))``, +Here, `P` is a point type on ``$(_math(:M))``, `T` its tangent vector type, `Q` a point type on ``$(_tex(:Cal, "N"))``, and `R<:Real` is a real number type where for the last two the functions a [`AbstractManoptProblem`](@ref)` p`, @@ -38,29 +36,30 @@ If you activate these to be different from the default identity, you have to pro # Constructor - ChambollePockState(M::AbstractManifold, N::AbstractManifold, - m::P, n::Q, p::P, X::T, primal_stepsize::R, dual_stepsize::R; + ChambollePockState(M::AbstractManifold, N::AbstractManifold; kwargs... ) where {P, Q, T, R <: Real} # Keyword arguments +* `n=``$(Manopt._link(:rand; M="N")) +* `p=`$(Manopt._link(:rand)) +* `m=`$(Manopt._link(:rand)) +* `X=`$(Manopt._link(:zero_vector)) * `acceleration=0.0` * `dual_stepsize=1/sqrt(8)` * `primal_stepsize=1/sqrt(8)` -* $_kw_inverse_retraction_method_default: $_kw_inverse_retraction_method -* `inverse_retraction_method_dual=`[`default_inverse_retraction_method`](@extref `ManifoldsBase.default_inverse_retraction_method-Tuple{AbstractManifold}`)`(N, typeof(n))` - an inverse retraction ``$(_l_retr)^{-1}`` to use on ``$(_l_Manifold("N"))``, see [the section on retractions and their inverses](@extref ManifoldsBase :doc:`retractions`). +$(_var(:Keyword, :inverse_retraction_method)) +$(_var(:Keyword, :inverse_retraction_method, "inverse_retraction_method_dual"; M="N", p="n")) * `relaxation=1.0` * `relax=:primal`: relax the primal variable by default -* $_kw_retraction_method_default: $_kw_retraction_method -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(300)` +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(300)`")) * `variant=:exact`: run the exact Chambolle Pock by default * `update_primal_base=missing` * `update_dual_base=missing` -* $_kw_vector_transport_method_default: $_kw_vector_transport_method -* `vector_transport_method=`[`default_vector_transport_method`](@extref `ManifoldsBase.default_vector_transport_method-Tuple{AbstractManifold}`)`(N, typeof(n))`: - a vector transport ``$(_l_vt)`` to use on ``$(_l_Manifold("N"))``, see [the section on vector transports](@extref ManifoldsBase :doc:`vector_transports`). +$(_var(:Keyword, :vector_transport_method)) +$(_var(:Keyword, :vector_transport_method, "vector_transport_method_dual"; M="N", p="n")) if `Manifolds.jl` is loaded, `N` is also a keyword argument and set to `TangentBundle(M)` by default. """ @@ -99,11 +98,11 @@ mutable struct ChambollePockState{ end function Manopt.ChambollePockState( M::AbstractManifold, - N::AbstractManifold, - m::P, - n::Q, - p::P, - X::T; + N::AbstractManifold; + m::P=rand(M), + n::Q=rand(N), + p::P=rand(M), + X::T=zero_vector(M, p), primal_stepsize::R=1 / sqrt(8), dual_stepsize::R=1 / sqrt(8), acceleration::R=0.0, @@ -188,6 +187,8 @@ function set_iterate!(apds::AbstractPrimalDualSolverState, p) return apds end +_tex_DΛ = "DΛ: T_{m}$(_math(:M)) → T_{Λ(m)}$(_tex(:Cal, "N")))" + _doc_ChambollePock_formula = raw""" Given a `cost` function ``\mathcal E:\mathcal M → ℝ`` of the form ```math @@ -210,14 +211,14 @@ This can be done inplace of ``p``. # Input parameters - $_arg_M - * `N`, a manifold ``$(_l_Manifold("N"))`` -$_arg_p -$_arg_X -* `m`, a base point on $_l_M -* `n`, a base point on $(_l_Manifold("N")) -* `adjoint_linearized_operator`: the adjoint ``DΛ^*`` of the linearized operator ``$_l_DΛ`` -* `prox_F, prox_G_Dual`: the proximal maps of ``F`` and ``G^\\ast_n`` +$(_var(:Argument, :M; type=true)) +$(_var(:Argument, :M, "N"; type=true)) +$(_var(:Argument, :p)) +$(_var(:Argument, :X)) +$(_var(:Argument, :p, "m")) +$(_var(:Argument, :p, "n"; M="N")) +* `adjoint_linearized_operator`: the adjoint ``DΛ^*`` of the linearized operator ``$(_tex_DΛ)`` +* `prox_F, prox_G_Dual`: the proximal maps of ``F`` and ``G^$(_tex(:ast))_n`` note that depending on the [`AbstractEvaluationType`](@ref) `evaluation` the last three parameters as well as the forward operator `Λ` and the `linearized_forward_operator` can be given as @@ -233,10 +234,9 @@ For more details on the algorithm, see [BergmannHerzogSilvaLouzeiroTenbrinckVida * `acceleration=0.05`: acceleration parameter * `dual_stepsize=1/sqrt(8)`: proximal parameter of the primal prox -* $_kw_evaluation_default: $_kw_evaluation -* $_kw_inverse_retraction_method_default: $_kw_inverse_retraction_method -* `inverse_retraction_method_dual=`[`default_inverse_retraction_method`](@extref `ManifoldsBase.default_inverse_retraction_method-Tuple{AbstractManifold}`)`(N, typeof(n))` - an inverse retraction ``$(_l_retr)^{-1}`` to use on $(_l_Manifold("N")), see [the section on retractions and their inverses](@extref ManifoldsBase :doc:`retractions`). +$(_var(:Keyword, :evaluation)) +$(_var(:Keyword, :inverse_retraction_method)) +$(_var(:Keyword, :inverse_retraction_method, "inverse_retraction_method_dual"; M="N", p="n")) * `Λ=missing`: the (forward) operator ``Λ(⋅)`` (required for the `:exact` variant) * `linearized_forward_operator=missing`: its linearization ``DΛ(⋅)[⋅]`` (required for the `:linearized` variant) * `primal_stepsize=1/sqrt(8)`: proximal parameter of the dual prox @@ -244,15 +244,14 @@ For more details on the algorithm, see [BergmannHerzogSilvaLouzeiroTenbrinckVida * `relax=:primal`: whether to relax the primal or dual * `variant=:exact` if `Λ` is missing, otherwise `:linearized`: variant to use. Note that this changes the arguments the `forward_operator` is called with. -* `stopping_criterion=[StopAfterIteration`](@ref)`(100)`: $_kw_stopping_criterion +$(_var(:Keyword, :stopping_criterion; default="[StopAfterIteration`](@ref)`(100)`")) * `update_primal_base=missing`: function to update `m` (identity by default/missing) * `update_dual_base=missing`: function to update `n` (identity by default/missing) -* $_kw_retraction_method_default: $_kw_retraction_method -* $_kw_vector_transport_method_default: $_kw_vector_transport_method -* `vector_transport_method_dual=`[`default_vector_transport_method`](@extref `ManifoldsBase.default_vector_transport_method-Tuple{AbstractManifold}`)`(N, typeof(n))`: - a vector transport ``$_l_vt`` to use on $(_l_Manifold("N")), see [the section on vector transports](@extref ManifoldsBase :doc:`vector_transports`). +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :vector_transport_method)) +$(_var(:Keyword, :vector_transport_method, "vector_transport_method_dual"; M="N", p="n")) -$_doc_sec_output +$(_note(:OutputSection)) """ @doc "$(_doc_ChambollePock)" @@ -340,11 +339,11 @@ function ChambollePock!( tmp = TwoManifoldProblem(M, N, dpdmo) cps = ChambollePockState( M, - m, - n, - p, - X; - N=N, + N; + m=m, + n=n, + p=p, + X=X, primal_stepsize=primal_stepsize, dual_stepsize=dual_stepsize, acceleration=acceleration, diff --git a/src/solvers/DouglasRachford.jl b/src/solvers/DouglasRachford.jl index 1bafb772ed..871c16eed3 100644 --- a/src/solvers/DouglasRachford.jl +++ b/src/solvers/DouglasRachford.jl @@ -8,38 +8,38 @@ Store all options required for the DouglasRachford algorithm, * `α`: relaxation of the step from old to new iterate, to be precise ``x^{(k+1)} = g(α(k); x^{(k)}, t^{(k)})``, where ``t^{(k)}`` is the result of the double reflection involved in the DR algorithm -* `inverse_retraction_method`: an inverse retraction method +$(_var(:Field, :inverse_retraction_method)) * `λ`: function to provide the value for the proximal parameter during the calls * `parallel`: indicate whether to use a parallel Douglas-Rachford or not. * `R`: method employed in the iteration to perform the reflection of `x` at the prox `p`. -* $(_field_iterate) +$(_var(:Field, :p; add=[:as_Iterate])) For the parallel Douglas-Rachford, this is not a value from the `PowerManifold` manifold but the mean. * `reflection_evaluation`: whether `R` works in-place or allocating -* $(_field_retr) +$(_var(:Field, :retraction_method)) * `s`: the last result of the double reflection at the proximal maps relaxed by `α`. -* $(_field_stop) +$(_var(:Field, :stopping_criterion, "stop")) # Constructor - DouglasRachfordState(M, p; kwargs...) + DouglasRachfordState(M::AbstractManifold; kwargs...) # Input -* $(_arg_M) -* $(_arg_p) +$(_var(:Argument, :M; type=true)) # Keyword arguments * `α= k -> 0.9`: relaxation of the step from old to new iterate, to be precise ``x^{(k+1)} = g(α(k); x^{(k)}, t^{(k)})``, where ``t^{(k)}`` is the result of the double reflection involved in the DR algorithm -* $(_kw_inverse_retraction_method_default): $(_kw_inverse_retraction_method) +$(_var(:Keyword, :inverse_retraction_method)) * `λ= k -> 1.0`: function to provide the value for the proximal parameter during the calls +$(_var(:Keyword, :p; add=:as_Initial)) * `R=`[`reflect`](@ref)`(!)`: method employed in the iteration to perform the reflection of `p` at the prox of `p`, which function is used depends on `reflection_evaluation`. * `reflection_evaluation=`[`AllocatingEvaluation`](@ref)`()`) specify whether the reflection works in-place or allocating (default) -* $(_kw_retraction_method_default): $(_kw_retraction_method) -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(300)`: $(_kw_stopping_criterion) +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(300)`")) * `parallel=false`: indicate whether to use a parallel Douglas-Rachford or not. """ mutable struct DouglasRachfordState{ @@ -65,8 +65,8 @@ mutable struct DouglasRachfordState{ stop::S parallel::Bool function DouglasRachfordState( - M::AbstractManifold, - p::P; + M::AbstractManifold; + p::P=rand(M), λ::Fλ=i -> 1.0, α::Fα=i -> 0.9, reflection_evaluation::E=AllocatingEvaluation(), @@ -147,11 +147,11 @@ _doc_Douglas_Rachford = """ DouglasRachford!(M, f, proxes_f, p) DouglasRachford!(M, mpo, p) -Compute the Douglas-Rachford algorithm on the manifold ``$(_l_M)``, starting from `p`` +Compute the Douglas-Rachford algorithm on the manifold ``$(_math(:M))``, starting from `p`` given the (two) proximal maps `proxes_f`, see [ BergmannPerschSteidl:2016](@cite). For ``k>2`` proximal maps, the problem is reformulated using the parallel Douglas Rachford: -a vectorial proximal map on the power manifold ``$(_l_M)^k`` is introduced as the first +a vectorial proximal map on the power manifold ``$(_math(:M))^k`` is introduced as the first proximal map and the second proximal map of the is set to the [`mean`](@extref Statistics.mean-Tuple{AbstractManifold, Vararg{Any}}) (Riemannian center of mass). This hence also boils down to two proximal maps, though each evaluates proximal maps in parallel, that is, component wise in a vector. @@ -166,36 +166,35 @@ If you provide a [`ManifoldProximalMapObjective`](@ref) `mpo` instead, the proxi # Input -* $(_arg_M) -* $(_arg_f) +$(_var(:Argument, :M; type=true)) +$(_var(:Argument, :f)) * `proxes_f`: functions of the form `(M, λ, p)-> q` performing a proximal maps, where `⁠λ` denotes the proximal parameter, for each of the summands of `F`. These can also be given in the [`InplaceEvaluation`](@ref) variants `(M, q, λ p) -> q` computing in place of `q`. -* $(_arg_p) +$(_var(:Argument, :p)) # Keyword arguments * `α= k -> 0.9`: relaxation of the step from old to new iterate, to be precise ``p^{(k+1)} = g(α_k; p^{(k)}, q^{(k)})``, where ``q^{(k)}`` is the result of the double reflection involved in the DR algorithm and ``g`` is a curve induced by the retraction and its inverse. -* $(_kw_evaluation_default): $(_kw_evaluation) -* $(_kw_inverse_retraction_method_default): $(_kw_inverse_retraction_method) +$(_var(:Keyword, :evaluation)) +$(_var(:Keyword, :inverse_retraction_method)) This is used both in the relaxation step as well as in the reflection, unless you set `R` yourself. * `λ= k -> 1.0`: function to provide the value for the proximal parameter ``λ_k`` * `R=reflect(!)`: method employed in the iteration to perform the reflection of `p` at the prox of `p`. This uses by default [`reflect`](@ref) or `reflect!` depending on `reflection_evaluation` and the retraction and inverse retraction specified by `retraction_method` and `inverse_retraction_method`, respectively. * `reflection_evaluation`: ([`AllocatingEvaluation`](@ref) whether `R` works in-place or allocating -* $(_kw_retraction_method_default): $(_kw_retraction_method) +$(_var(:Keyword, :retraction_method)) This is used both in the relaxation step as well as in the reflection, unless you set `R` yourself. -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(200)`$(_sc_any)[`StopWhenChangeLess`](@ref)`(1e-5)`: - $(_kw_stopping_criterion) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(200)`$(_sc(:Any))[`StopWhenChangeLess`](@ref)`(1e-5)`")) * `parallel=false`: indicate whether to use a parallel Douglas-Rachford or not. -$(_kw_others) +$(_note(:OtherKeywords)) -$(_doc_sec_output) +$(_note(:OutputSection)) """ @doc "$(_doc_Douglas_Rachford)" @@ -209,30 +208,15 @@ function DouglasRachford( parallel=0, kwargs..., ) where {TF} - N, f_, (prox1, prox2), parallel_, p0 = parallel_to_alternating_DR( - M, f, proxes_f, p, parallel, evaluation + p_ = _ensure_mutating_variable(p) + f_ = _ensure_mutating_cost(f, p) + proxes_f_ = [_ensure_mutating_prox(prox_f, p, evaluation) for prox_f in proxes_f] + N, f__, (prox1, prox2), parallel_, q = parallel_to_alternating_DR( + M, f_, proxes_f_, p_, parallel, evaluation ) - mpo = ManifoldProximalMapObjective(f_, (prox1, prox2); evaluation=evaluation) - return DouglasRachford(N, mpo, p0; evaluation=evaluation, parallel=parallel_, kwargs...) -end -function DouglasRachford( - M::AbstractManifold, - f::TF, - proxes_f::Vector{<:Any}, - p::Number; - evaluation::AbstractEvaluationType=AllocatingEvaluation(), - kwargs..., -) where {TF} - q = [p] - f_(M, p) = f(M, p[]) - if evaluation isa AllocatingEvaluation - proxes_f_ = [(M, λ, p) -> [pf(M, λ, p[])] for pf in proxes_f] - else - proxes_f_ = [(M, q, λ, p) -> (q .= [pf(M, λ, p[])]) for pf in proxes_f] - end - rs = DouglasRachford(M, f_, proxes_f_, q; evaluation=evaluation, kwargs...) - #return just a number if the return type is the same as the type of q - return (typeof(q) == typeof(rs)) ? rs[] : rs + mpo = ManifoldProximalMapObjective(f__, (prox1, prox2); evaluation=evaluation) + rs = DouglasRachford(N, mpo, q; evaluation=evaluation, parallel=parallel_, kwargs...) + return _ensure_matching_output(p, rs) end function DouglasRachford( M::AbstractManifold, mpo::O, p; kwargs... @@ -292,7 +276,7 @@ function DouglasRachford!( end, parallel::Int=0, stopping_criterion::StoppingCriterion=StopAfterIteration(200) | - StopWhenChangeLess(1e-5), + StopWhenChangeLess(M, 1e-5), kwargs..., #especially may contain decorator options ) where { Tλ, @@ -304,8 +288,8 @@ function DouglasRachford!( dmpo = decorate_objective!(M, mpo; kwargs...) dmp = DefaultManoptProblem(M, dmpo) drs = DouglasRachfordState( - M, - p; + M; + p=p, λ=λ, α=α, R=R, diff --git a/src/solvers/FrankWolfe.jl b/src/solvers/FrankWolfe.jl index 56f7ff9087..08848e0648 100644 --- a/src/solvers/FrankWolfe.jl +++ b/src/solvers/FrankWolfe.jl @@ -14,14 +14,15 @@ It comes in two forms, depending on the realisation of the `subproblem`. # Fields -* $_field_iterate -* $_field_gradient -* $_field_inv_retr -* $_field_sub_problem -* $_field_sub_state -* $_field_stop -* $_field_step -* $_field_retr +$(_var(:Field, :p; add=[:as_Iterate])) +$(_var(:Field, :X; add=[:as_Gradient])) +$(_var(:Field, :inverse_retraction_method)) +$(_var(:Field, :vector_transport_method)) +$(_var(:Field, :sub_problem)) +$(_var(:Field, :sub_state)) +$(_var(:Field, :stopping_criterion, "stop")) +$(_var(:Field, :stepsize)) +$(_var(:Field, :retraction_method)) The sub task requires a method to solve @@ -29,25 +30,28 @@ $_doc_FW_sub # Constructor - FrankWolfeState(M, p, sub_problem, sub_state; kwargs...) + FrankWolfeState(M, sub_problem, sub_state; kwargs...) -Initialise the Frank Wolfe method state with. +Initialise the Frank Wolfe method state. + +FrankWolfeState(M, sub_problem; evaluation=AllocatingEvaluation(), kwargs...) + +Initialise the Frank Wolfe method state, where `sub_problem` is a closed form solution with `evaluation` as type of evaluation. ## Input -* $_arg_M -* $_arg_p -* $_arg_X -* $_arg_sub_problem -* $_arg_sub_state +$(_var(:Argument, :M; type=true)) +$(_var(:Argument, :sub_problem)) +$(_var(:Argument, :sub_state)) ## Keyword arguments -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(200)`$_sc_any[`StopWhenGradientNormLess`](@ref)`(1e-6)` $_kw_stop_note -* `stepsize=`[`default_stepsize`](@ref)`(M, FrankWolfeState)` -* $_kw_retraction_method_default -* $_kw_inverse_retraction_method_default -* $_kw_X_default +$(_var(:Keyword, :p; add=:as_Initial)) +$(_var(:Keyword, :inverse_retraction_method)) +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(200)`$(_sc(:Any))[`StopWhenGradientNormLess`](@ref)`(1e-6)`")) +$(_var(:Keyword, :stepsize; default="[`default_stepsize`](@ref)`(M, FrankWolfeState)`")) +$(_var(:Keyword, :X; add=:as_Memory)) where the remaining fields from before are keyword arguments. """ @@ -71,30 +75,29 @@ mutable struct FrankWolfeState{ inverse_retraction_method::ITM function FrankWolfeState( M::AbstractManifold, - p::P, sub_problem::Pr, - sub_state::Union{AbstractEvaluationType,AbstractManoptSolverState}; - initial_vector::T=zero_vector(M, p), #deprecated - X::T=initial_vector, + sub_state::St; + p::P=rand(M), + X::T=zero_vector(M, p), stopping_criterion::TStop=StopAfterIteration(200) | StopWhenGradientNormLess(1e-6), stepsize::TStep=default_stepsize(M, FrankWolfeState), retraction_method::TM=default_retraction_method(M, typeof(p)), inverse_retraction_method::ITM=default_inverse_retraction_method(M, typeof(p)), ) where { P, - Pr, T, + Pr<:Union{AbstractManoptProblem,F} where {F}, + St<:AbstractManoptSolverState, TStop<:StoppingCriterion, TStep<:Stepsize, TM<:AbstractRetractionMethod, ITM<:AbstractInverseRetractionMethod, } - sub_state_storage = maybe_wrap_evaluation_type(sub_state) - return new{P,T,Pr,typeof(sub_state_storage),TStep,TStop,TM,ITM}( + return new{P,T,Pr,St,TStep,TStop,TM,ITM}( p, - initial_vector, + X, sub_problem, - sub_state_storage, + sub_state, stopping_criterion, stepsize, retraction_method, @@ -102,8 +105,15 @@ mutable struct FrankWolfeState{ ) end end -function default_stepsize(::AbstractManifold, ::Type{FrankWolfeState}) - return DecreasingStepsize(; length=2.0, shift=2) +function FrankWolfeState( + M::AbstractManifold, sub_problem; evaluation::E=AllocatingEvaluation(), kwargs... +) where {E<:AbstractEvaluationType} + cfs = ClosedFormSubSolverState(; evaluation=evaluation) + return FrankWolfeState(M, sub_problem, cfs; kwargs...) +end + +function default_stepsize(M::AbstractManifold, ::Type{FrankWolfeState}) + return DecreasingStepsize(M; length=2.0, shift=2.0) end get_gradient(fws::FrankWolfeState) = fws.X get_iterate(fws::FrankWolfeState) = fws.p @@ -148,12 +158,12 @@ _doc_FW_problem = raw""" """ _doc_FW_sk_default = raw"``s_k = \frac{2}{k+2}``" _doc_Frank_Wolfe_method = """ - Frank_Wolfe_method(M, f, grad_f, p) - Frank_Wolfe_method(M, gradient_objective, p; kwargs...) + Frank_Wolfe_method(M, f, grad_f, p=rand(M)) + Frank_Wolfe_method(M, gradient_objective, p=rand(M); kwargs...) Frank_Wolfe_method!(M, f, grad_f, p; kwargs...) Frank_Wolfe_method!(M, gradient_objective, p; kwargs...) -Perform the Frank-Wolfe algorithm to compute for ``$_l_C_subset_M`` +Perform the Frank-Wolfe algorithm to compute for ``$(_tex(:Cal, "C")) ⊂ $(_tex(:Cal, "M"))`` the constrained problem $_doc_FW_problem @@ -174,58 +184,37 @@ use a retraction and its inverse. # Input -$_arg_M -$_arg_f -$_arg_grad_f -$_arg_p +$(_var(:Argument, :M; type=true)) +$(_var(:Argument, :f)) +$(_var(:Argument, :grad_f)) +$(_var(:Argument, :p)) -$_arg_alt_mgo +$(_note(:GradientObjective)) # Keyword arguments -* $_kw_evaluation_default: - $_kw_evaluation $_kw_evaluation_example - -* $_kw_retraction_method_default: - $_kw_retraction_method - -* `stepsize=`[`DecreasingStepsize`](@ref)`(; length=2.0, shift=2)`: - $_kw_stepsize, where the default is the step size $_doc_FW_sk_default - -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(500)`$_sc_any[`StopWhenGradientNormLess`](@ref)`(1.0e-6)`) - $_kw_stopping_criterion - -* $_kw_X_default: - $_kw_X, the evaluated gradient ``$_l_grad f`` evaluated at ``p^{(k)}``. - +$(_var(:Keyword, :evaluation)) +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :stepsize; default="[`DecreasingStepsize`](@ref)`(; length=2.0, shift=2)`")) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(500)`$(_sc(:Any))[`StopWhenGradientNormLess`](@ref)`(1.0e-6)`)")) * `sub_cost=`[`FrankWolfeCost`](@ref)`(p, X)`: - the cost of the Frank-Wolfe sub problem. $(_kw_used_in("sub_objective")) - + the cost of the Frank-Wolfe sub problem. $(_note(:KeywordUsedIn, "sub_objective")) * `sub_grad=`[`FrankWolfeGradient`](@ref)`(p, X)`: - the gradient of the Frank-Wolfe sub problem. $(_kw_used_in("sub_objective")) - -* $_kw_sub_kwargs_default: $_kw_sub_kwargs + the gradient of the Frank-Wolfe sub problem. $(_note(:KeywordUsedIn, "sub_objective")) +$(_var(:Keyword, :sub_kwargs)) * `sub_objective=`[`ManifoldGradientObjective`](@ref)`(sub_cost, sub_gradient)`: - the objective for the Frank-Wolfe sub problem. $(_kw_used_in("sub_problem")) - -* `sub_problem=`[`DefaultManoptProblem`](@ref)`(M, sub_objective)`): the sub problem to solve. - This can be given in three forms - 1. as an [`AbstractManoptProblem`](@ref), then the `sub_state=` specifies the solver to use - 2. as a closed form solution, as a function evaluating with new allocations `(M, p, X) -> q` that solves the sub problem on `M` given the current iterate `p` and (sub)gradient `X`. - 3. as a closed form solution, as a function `(M, q, p, X) -> q` working in place of `q`. - For points 2 and 3 the `sub_state` has to be set to the corresponding [`AbstractEvaluationType`](@ref), [`AllocatingEvaluation`](@ref) and [`InplaceEvaluation`](@ref), respectively - This keyword takes further into account `sub_kwargs` to evejtually decorate the problem + the objective for the Frank-Wolfe sub problem. $(_note(:KeywordUsedIn, "sub_problem")) -* `sub_state= if sub_problem isa Function evaluation else GradientDescentState(M, copy(M,p); kwargs...)`: +$(_var(:Keyword, :sub_problem; default="[`DefaultManoptProblem`](@ref)`(M, sub_objective)`")) +$(_var(:Keyword, :sub_state; default="[`GradientDescentState`](@ref)`(M, copy(M,p))`")) - specify either the solver for a `sub_problem` or the kind of evaluation if the sub problem is given by a closed form solution - this keyword takes into account the `sub_stopping_criterion`, and the `sub_kwargs`, that are also used to potentially decorate the state. +$(_var(:Keyword, :X; add=:as_Gradient)) +$(_var(:Keyword, :stopping_criterion, "sub_stopping_criterion"; default="`[`StopAfterIteration`](@ref)`(300)`$(_sc(:Any))[`StopWhenStepsizeLess`](@ref)`(1e-8)`")) + $(_note(:KeywordUsedIn, "sub_state")) +$(_var(:Keyword, :X; add=:as_Gradient)) -* `sub_stopping_criterion=`[`StopAfterIteration`](@ref)`(300)`$_sc_any[`StopWhenStepsizeLess`](@ref)`(1e-8)`: - $_kw_stopping_criterion for the sub solver. $(_kw_used_in("sub_state")) - -$_kw_others +$(_note(:OtherKeywords)) If you provide the [`ManifoldGradientObjective`](@ref) directly, the `evaluation=` keyword is ignored. The decorations are still applied to the objective. @@ -241,31 +230,19 @@ function Frank_Wolfe_method( M::AbstractManifold, f, grad_f, - p; - evaluation::AbstractEvaluationType=AllocatingEvaluation(), - kwargs..., -) - mgo = ManifoldGradientObjective(f, grad_f; evaluation=evaluation) - return Frank_Wolfe_method(M, mgo, p; evaluation=evaluation, kwargs...) -end -function Frank_Wolfe_method( - M::AbstractManifold, - f, - grad_f, - p::Number; + p=rand(M); evaluation::AbstractEvaluationType=AllocatingEvaluation(), kwargs..., ) - # redefine initial point - q = [p] - f_(M, p) = f(M, p[]) - grad_f_ = _to_mutating_gradient(grad_f, evaluation) - rs = Frank_Wolfe_method(M, f_, grad_f_, q; evaluation=evaluation, kwargs...) - #return just a number if the return type is the same as the type of q - return (typeof(q) == typeof(rs)) ? rs[] : rs + p_ = _ensure_mutating_variable(p) + f_ = _ensure_mutating_cost(f, p) + grad_f_ = _ensure_mutating_gradient(grad_f, p, evaluation) + mgo = ManifoldGradientObjective(f_, grad_f_; evaluation=evaluation) + rs = Frank_Wolfe_method(M, mgo, p_; evaluation=evaluation, kwargs...) + return _ensure_matching_output(p, rs) end function Frank_Wolfe_method( - M::AbstractManifold, mgo::O, p; kwargs... + M::AbstractManifold, mgo::O, p=rand(M); kwargs... ) where {O<:Union{ManifoldGradientObjective,AbstractDecoratedManifoldObjective}} q = copy(M, p) return Frank_Wolfe_method!(M, mgo, q; kwargs...) @@ -288,15 +265,14 @@ function Frank_Wolfe_method!( M::AbstractManifold, mgo::O, p; - initial_vector=zero_vector(M, p), #deprecated - X=initial_vector, + X=zero_vector(M, p), evaluation=AllocatingEvaluation(), objective_type=:Riemannian, retraction_method=default_retraction_method(M, typeof(p)), - stepsize::TStep=default_stepsize(M, FrankWolfeState), + stepsize::Union{Stepsize,ManifoldDefaultsFactory}=default_stepsize(M, FrankWolfeState), stopping_criterion::TStop=StopAfterIteration(200) | StopWhenGradientNormLess(1.0e-8) | - StopWhenChangeLess(1.0e-8), + StopWhenChangeLess(M, 1.0e-8), sub_cost=FrankWolfeCost(p, X), sub_grad=FrankWolfeGradient(p, X), sub_kwargs=(;), @@ -312,8 +288,8 @@ function Frank_Wolfe_method!( else decorate_state!( GradientDescentState( - M, - copy(M, p); + M; + p=copy(M, p), stopping_criterion=sub_stopping_criterion, stepsize=default_stepsize( M, GradientDescentState; retraction_method=retraction_method @@ -327,7 +303,6 @@ function Frank_Wolfe_method!( kwargs..., ) where { TStop<:StoppingCriterion, - TStep<:Stepsize, O<:Union{ManifoldGradientObjective,AbstractDecoratedManifoldObjective}, } dmgo = decorate_objective!(M, mgo; objective_type=objective_type, kwargs...) @@ -335,12 +310,12 @@ function Frank_Wolfe_method!( sub_state_storage = maybe_wrap_evaluation_type(sub_state) fws = FrankWolfeState( M, - p, sub_problem, sub_state_storage; - initial_vector=initial_vector, + p=p, + X=X, retraction_method=retraction_method, - stepsize=stepsize, + stepsize=_produce_type(stepsize, M), stopping_criterion=stopping_criterion, ) dfws = decorate_state!(fws; kwargs...) diff --git a/src/solvers/Lanczos.jl b/src/solvers/Lanczos.jl index 7484d05fb7..72c4b93a2d 100644 --- a/src/solvers/Lanczos.jl +++ b/src/solvers/Lanczos.jl @@ -9,8 +9,8 @@ Solve the adaptive regularized subproblem with a Lanczos iteration # Fields -* `stop`: the stopping criterion -* `stop_newton`: the stopping criterion for the inner Newton iteration +$(_var(:Field, :stopping_criterion, "stop")) +$(_var(:Field, :stopping_criterion, "stop_newton", add="used for the inner Newton iteration")) * `σ`: the current regularization parameter * `X`: the Iterate * `Lanczos_vectors`: the obtained Lanczos vectors @@ -26,12 +26,11 @@ Solve the adaptive regularized subproblem with a Lanczos iteration ## Keyword arguments -* $_kw_X_default: the iterate using the manifold of the tangent space. +$(_var(:Keyword, :X; add="as the iterate")) * `maxIterLanzcos=200`: shortcut to set the maximal number of iterations in the ` stopping_crtierion=` * `θ=0.5`: set the parameter in the [`StopWhenFirstOrderProgress`](@ref) within the default `stopping_criterion=`. -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(maxIterLanczos)`$_sc_any[`StopWhenFirstOrderProgress`](@ref)`(θ)`: - the stopping criterion for the Lanczos iteration. -* `stopping_criterion_newtown=`[`StopAfterIteration`](@ref)`(200)`: the stopping criterion for the inner Newton iteration. +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(maxIterLanczos)`$(_sc(:Any))[`StopWhenFirstOrderProgress`](@ref)`(θ)`")) +$(_var(:Keyword, :stopping_criterion, "stopping_criterion_newton"; default="[`StopAfterIteration`](@ref)`(200)`", add=" used for the inner Newton iteration")) * `σ=10.0`: specify the regularization parameter """ mutable struct LanczosState{T,R,SC,SCN,B,TM,C} <: AbstractManoptSolverState @@ -53,7 +52,7 @@ function LanczosState( θ=0.5, stopping_criterion::SC=StopAfterIteration(maxIterLanczos) | StopWhenFirstOrderProgress(θ), - stopping_criterion_newtown::SCN=StopAfterIteration(200), + stopping_criterion_newton::SCN=StopAfterIteration(200), σ::R=10.0, ) where {T,SC<:StoppingCriterion,SCN<:StoppingCriterion,R} tridig = spdiagm(maxIterLanczos, maxIterLanczos, [0.0]) @@ -63,7 +62,7 @@ function LanczosState( X, σ, stopping_criterion, - stopping_criterion_newtown, + stopping_criterion_newton, Lanczos_vectors, tridig, coeffs, @@ -79,7 +78,7 @@ function set_iterate!(ls::LanczosState, M, X) ls.X .= X return ls end -function set_manopt_parameter!(ls::LanczosState, ::Val{:σ}, σ) +function set_parameter!(ls::LanczosState, ::Val{:σ}, σ) ls.σ = σ return ls end @@ -253,14 +252,14 @@ solver indicating that the model function at the current (outer) iterate, $_doc_ARC_mdoel -defined on the tangent space ``$(_l_TpM())`` fulfills at the current iterate ``X_k`` that +defined on the tangent space ``$(_math(:TpM))`` fulfills at the current iterate ``X_k`` that $_math_sc_firstorder # Fields * `θ`: the factor ``θ`` in the second condition -* $_field_at_iteration +$(_var(:Field, :at_iteration)) # Constructor diff --git a/src/solvers/LevenbergMarquardt.jl b/src/solvers/LevenbergMarquardt.jl index 542d365cba..4b292ad5ad 100644 --- a/src/solvers/LevenbergMarquardt.jl +++ b/src/solvers/LevenbergMarquardt.jl @@ -11,19 +11,19 @@ Solve an optimization problem of the form $(_doc_LM_formula) -where ``f: $(_l_M) → ℝ^d`` is a continuously differentiable function, +where ``f: $(_math(:M)) → ℝ^d`` is a continuously differentiable function, using the Riemannian Levenberg-Marquardt algorithm [Peeters:1993](@cite). The implementation follows Algorithm 1 [AdachiOkunoTakeda:2022](@cite). The second signature performs the optimization in-place of `p`. # Input -$(_arg_M) -* `f`: a cost function ``f: $(_l_M) M→ℝ^d`` +$(_var(:Argument, :M; type=true)) +* `f`: a cost function ``f: $(_math(:M)) M→ℝ^d`` * `jacobian_f`: the Jacobian of ``f``. The Jacobian is supposed to accept a keyword argument `basis_domain` which specifies basis of the tangent space at a given point in which the Jacobian is to be calculated. By default it should be the `DefaultOrthonormalBasis`. -$(_arg_p) +$(_var(:Argument, :p)) * `num_components`: length of the vector returned by the cost function (`d`). By default its value is -1 which means that it is determined automatically by calling `f` one additional time. This is only possible when `evaluation` is `AllocatingEvaluation`, @@ -34,11 +34,10 @@ then the keyword `jacobian_tangent_basis` below is ignored # Keyword arguments -* $(_kw_evaluation_default): $(_kw_evaluation) +$(_var(:Keyword, :evaluation)) * `η=0.2`: scaling factor for the sufficient cost decrease threshold required to accept new proposal points. Allowed range: `0 < η < 1`. * `expect_zero_residual=false`: whether or not the algorithm might expect that the value of residual (objective) at minimum is equal to 0. - $(_kw_stopping_criterion) * `damping_term_min=0.1`: initial (and also minimal) value of the damping term * `β=5.0`: parameter by which the damping term is multiplied when the current new point is rejected * `initial_jacobian_f`: the initial Jacobian of the cost function `f`. @@ -46,12 +45,12 @@ then the keyword `jacobian_tangent_basis` below is ignored * `initial_residual_values`: the initial residual vector of the cost function `f`. By default this is a vector of length `num_components` of similar type as `p`. * `jacobian_tangent_basis`: an [`AbstractBasis`](@extref `ManifoldsBase.AbstractBasis`) specify the basis of the tangent space for `jacobian_f`. -* $(_kw_retraction_method_default): $(_kw_retraction_method) -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(200)`$(_sc_any)[`StopWhenGradientNormLess`](@ref)`(1e-12)`: +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(200)`$(_sc(:Any))[`StopWhenGradientNormLess`](@ref)`(1e-12)`")) -$(_kw_others) +$(_note(:OtherKeywords)) -$(_doc_sec_output) +$(_note(:OutputSection)) """ @doc "$(_doc_LM)" @@ -75,16 +74,9 @@ function LevenbergMarquardt( p, num_components::Int=-1; evaluation::AbstractEvaluationType=AllocatingEvaluation(), - jacB=nothing, - jacobian_tangent_basis::AbstractBasis=if isnothing(jacB) - DefaultOrthonormalBasis() - else - jacB - end, + jacobian_tangent_basis::AbstractBasis=DefaultOrthonormalBasis(), kwargs..., ) - !isnothing(jacB) && - (@warn "The keyword `jacB` is deprecated, use `jacobian_tangent_basis` instead.") if num_components == -1 if evaluation === AllocatingEvaluation() num_components = length(f(M, p)) @@ -121,16 +113,9 @@ function LevenbergMarquardt!( p, num_components::Int=-1; evaluation::AbstractEvaluationType=AllocatingEvaluation(), - jacB=nothing, - jacobian_tangent_basis::AbstractBasis=if isnothing(jacB) - DefaultOrthonormalBasis() - else - jacB - end, + jacobian_tangent_basis::AbstractBasis=DefaultOrthonormalBasis(), kwargs..., ) - !isnothing(jacB) && - (@warn "The keyword `jacB` is deprecated, use `jacobian_tangent_basis` instead.") if num_components == -1 if evaluation === AllocatingEvaluation() num_components = length(f(M, p)) @@ -175,9 +160,9 @@ function LevenbergMarquardt!( nlsp = DefaultManoptProblem(M, dnlso) lms = LevenbergMarquardtState( M, - p, initial_residual_values, initial_jacobian_f; + p=p, β, η, damping_term_min, diff --git a/src/solvers/NelderMead.jl b/src/solvers/NelderMead.jl index 42f734c343..500fc3c3cb 100644 --- a/src/solvers/NelderMead.jl +++ b/src/solvers/NelderMead.jl @@ -9,7 +9,7 @@ A simplex for the Nelder-Mead algorithm. NelderMeadSimplex(M::AbstractManifold) Construct a simplex using ``d+1`` random points from manifold `M`, -where ``d`` is the $(_link_manifold_dimension("")) of `M`. +where ``d`` is the $(_link(:manifold_dimension; M="")) of `M`. NelderMeadSimplex( M::AbstractManifold, @@ -32,9 +32,6 @@ end function NelderMeadSimplex(M::AbstractManifold) return NelderMeadSimplex([rand(M) for i in 1:(manifold_dimension(M) + 1)]) end -function NelderMeadSimplex(M::AbstractManifold, p::Number, B::AbstractBasis; kwargs...) - return NelderMeadSimplex(M, [p], B; kwargs...) -end function NelderMeadSimplex( M::AbstractManifold, p, @@ -42,11 +39,12 @@ function NelderMeadSimplex( a::Real=0.025, retraction_method::AbstractRetractionMethod=default_retraction_method(M, typeof(p)), ) + p_ = _ensure_mutating_variable(p) M_dim = manifold_dimension(M) vecs = [ - get_vector(M, p, [ifelse(i == j, a, zero(a)) for i in 1:M_dim], B) for j in 0:M_dim + get_vector(M, p_, [ifelse(i == j, a, zero(a)) for i in 1:M_dim], B) for j in 0:M_dim ] - pts = map(X -> retract(M, p, X, retraction_method), vecs) + pts = map(X -> retract(M, p_, X, retraction_method), vecs) return NelderMeadSimplex(pts) end @@ -63,33 +61,34 @@ of the Euclidean case. The default is given in brackets, the required value rang after the description * `population::`[`NelderMeadSimplex`](@ref): a population (set) of ``d+1`` points ``x_i``, ``i=1,…,n+1``, where ``d`` - is the $(_link_manifold_dimension("")) of `M`. -* $_field_step + is the $(_link(:manifold_dimension; M="")) of `M`. +$(_var(:Field, :stepsize)) * `α`: the reflection parameter ``α > 0``: * `γ` the expansion parameter ``γ > 0``: * `ρ`: the contraction parameter, ``0 < ρ ≤ \\frac{1}{2}``, * `σ`: the shrinkage coefficient, ``0 < σ ≤ 1`` -* `p`: a field to store the current best value (initialized to _some_ point here) -* $_field_retr -* $_field_inv_retr +$(_var(:Field, :p; add=" storing the current best point")) +$(_var(:Field, :inverse_retraction_method)) +$(_var(:Field, :retraction_method)) # Constructors - NelderMeadState(M, population::NelderMeadSimplex=NelderMeadSimplex(M)); kwargs...) + NelderMeadState(M::AbstractManifold; kwargs...) Construct a Nelder-Mead Option with a default population (if not provided) of set of `dimension(M)+1` random points stored in [`NelderMeadSimplex`](@ref). # Keyword arguments -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(2000)`$_sc_any[`StopWhenPopulationConcentrated`](@ref)`()`): +* `population=`[`NelderMeadSimplex`](@ref)`(M)` +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(2000)`$(_sc(:Any))[`StopWhenPopulationConcentrated`](@ref)`()`)")) a [`StoppingCriterion`](@ref) * `α=1.0`: reflection parameter ``α > 0``: * `γ=2.0` expansion parameter ``γ``: * `ρ=1/2`: contraction parameter, ``0 < ρ ≤ \\frac{1}{2}``, * `σ=1/2`: shrink coefficient, ``0 < σ ≤ 1`` -* $_kw_retraction_method_default: $_kw_retraction_method -* $_kw_inverse_retraction_method_default: $_kw_inverse_retraction_method`inverse_retraction_method=default_inverse_retraction_method(M, typeof(p))`: an inverse retraction to use. +$(_var(:Keyword, :inverse_retraction_method)) +$(_var(:Keyword, :retraction_method)) * `p=copy(M, population.pts[1])`: initialise the storage for the best point (iterate)¨ """ mutable struct NelderMeadState{ @@ -113,8 +112,8 @@ mutable struct NelderMeadState{ retraction_method::TR inverse_retraction_method::TI function NelderMeadState( - M::AbstractManifold, - population::NelderMeadSimplex{T}=NelderMeadSimplex(M); + M::AbstractManifold; + population::NelderMeadSimplex{T}=NelderMeadSimplex(M), stopping_criterion::StoppingCriterion=StopAfterIteration(2000) | StopWhenPopulationConcentrated(), α=1.0, @@ -185,28 +184,28 @@ _doc_NelderMead = """ NelderMead!(M::AbstractManifold, f, population) NelderMead!(M::AbstractManifold, mco::AbstractManifoldCostObjective, population) -Solve a Nelder-Mead minimization problem for the cost function ``f: $_l_M`` on the +Solve a Nelder-Mead minimization problem for the cost function ``f: $(_tex(:Cal, "M")) → ℝ`` on the manifold `M`. If the initial [`NelderMeadSimplex`](@ref) is not provided, a random set of points is chosen. The compuation can be performed in-place of the `population`. -The algorithm consists of the following steps. Let ``d`` denote the dimension of the manifold ``$_l_M``. +The algorithm consists of the following steps. Let ``d`` denote the dimension of the manifold ``$(_tex(:Cal, "M"))``. 1. Order the simplex vertices ``p_i, i=1,…,d+1`` by increasing cost, such that we have ``f(p_1) ≤ f(p_2) ≤ … ≤ f(p_{d+1})``. -2. Compute the Riemannian center of mass [Karcher:1977](@cite), cf. [`mean`](@extref Statistics.mean-Tuple{AbstractManifold, Vararg{Any}}), ``p_{$(_l_txt("m"))}`` +2. Compute the Riemannian center of mass [Karcher:1977](@cite), cf. [`mean`](@extref Statistics.mean-Tuple{AbstractManifold, Vararg{Any}}), ``p_{$(_tex(:text, "m"))}`` of the simplex vertices ``p_1,…,p_{d+1}``. -3. Reflect the point with the worst point at the mean ``p_{$(_l_txt("r"))} = $(_l_retr)_{p_{$(_l_txt("m"))}}\\bigl( - α$(_l_retr)^{-1}_{p_{$(_l_txt("m"))}} (p_{d+1}) \\bigr)`` - If ``f(p_1) ≤ f(p_{$(_l_txt("r"))}) ≤ f(p_{d})`` then set ``p_{d+1} = p_{$(_l_txt("r"))}`` and go to step 1. -4. Expand the simplex if ``f(p_{$(_l_txt("r"))}) < f(p_1)`` by computing the expantion point ``p_{$(_l_txt("e"))} = $(_l_retr)_{p_{$(_l_txt("m"))}}\\bigl( - γα$(_l_retr)^{-1}_{p_{$(_l_txt("m"))}} (p_{d+1}) \\bigr)``, +3. Reflect the point with the worst point at the mean ``p_{$(_tex(:text, "r"))} = $(_tex(:retr))_{p_{$(_tex(:text, "m"))}}\\bigl( - α$(_tex(:invretr))_{p_{$(_tex(:text, "m"))}} (p_{d+1}) \\bigr)`` + If ``f(p_1) ≤ f(p_{$(_tex(:text, "r"))}) ≤ f(p_{d})`` then set ``p_{d+1} = p_{$(_tex(:text, "r"))}`` and go to step 1. +4. Expand the simplex if ``f(p_{$(_tex(:text, "r"))}) < f(p_1)`` by computing the expantion point ``p_{$(_tex(:text, "e"))} = $(_tex(:retr))_{p_{$(_tex(:text, "m"))}}\\bigl( - γα$(_tex(:invretr))_{p_{$(_tex(:text, "m"))}} (p_{d+1}) \\bigr)``, which in this formulation allows to reuse the tangent vector from the inverse retraction from before. - If ``f(p_{$(_l_txt("e"))}) < f(p_{$(_l_txt("r"))})`` then set ``p_{d+1} = p_{$(_l_txt("e"))}`` otherwise set set ``p_{d+1} = p_{$(_l_txt("r"))}``. Then go to Step 1. -5. Contract the simplex if ``f(p_{$(_l_txt("r"))}) ≥ f(p_d)``. - 1. If ``f(p_{$(_l_txt("r"))}) < f(p_{d+1})`` set the step ``s = -ρ`` + If ``f(p_{$(_tex(:text, "e"))}) < f(p_{$(_tex(:text, "r"))})`` then set ``p_{d+1} = p_{$(_tex(:text, "e"))}`` otherwise set set ``p_{d+1} = p_{$(_tex(:text, "r"))}``. Then go to Step 1. +5. Contract the simplex if ``f(p_{$(_tex(:text, "r"))}) ≥ f(p_d)``. + 1. If ``f(p_{$(_tex(:text, "r"))}) < f(p_{d+1})`` set the step ``s = -ρ`` 2. otherwise set ``s=ρ``. - Compute the contraction point ``p_{$(_l_txt("c"))} = $(_l_retr)_{p_{$(_l_txt("m"))}}\\bigl(s$(_l_retr)^{-1}_{p_{$(_l_txt("m"))}} p_{d+1} \\bigr)``. - 1. in this case if ``f(p_{$(_l_txt("c"))}) < f(p_{$(_l_txt("r"))})`` set ``p_{d+1} = p_{$(_l_txt("c"))}`` and go to step 1 - 2. in this case if ``f(p_{$(_l_txt("c"))}) < f(p_{d+1})`` set ``p_{d+1} = p_{$(_l_txt("c"))}`` and go to step 1 + Compute the contraction point ``p_{$(_tex(:text, "c"))} = $(_tex(:retr))_{p_{$(_tex(:text, "m"))}}\\bigl(s$(_tex(:invretr))_{p_{$(_tex(:text, "m"))}} p_{d+1} \\bigr)``. + 1. in this case if ``f(p_{$(_tex(:text, "c"))}) < f(p_{$(_tex(:text, "r"))})`` set ``p_{d+1} = p_{$(_tex(:text, "c"))}`` and go to step 1 + 2. in this case if ``f(p_{$(_tex(:text, "c"))}) < f(p_{d+1})`` set ``p_{d+1} = p_{$(_tex(:text, "c"))}`` and go to step 1 6. Shrink all points (closer to ``p_1``). For all ``i=2,...,d+1`` set - ``p_{i} = $(_l_retr)_{p_{1}}\\bigl( σ$(_l_retr)^{-1}_{p_{1}} p_{i} \\bigr).`` + ``p_{i} = $(_tex(:retr))_{p_{1}}\\bigl( σ$(_tex(:invretr))_{p_{1}} p_{i} \\bigr).`` For more details, see The Euclidean variant in the Wikipedia [https://en.wikipedia.org/wiki/Nelder-Mead_method](https://en.wikipedia.org/wiki/Nelder-Mead_method) @@ -214,25 +213,25 @@ or Algorithm 4.1 in [http://www.optimization-online.org/DB_FILE/2007/08/1742.pdf # Input -$_arg_M -$_arg_f +$(_var(:Argument, :M; type=true)) +$(_var(:Argument, :f)) * `population::`[`NelderMeadSimplex`](@ref)`=`[`NelderMeadSimplex`](@ref)`(M)`: an initial simplex of ``d+1`` points, where ``d`` - is the $(_link_manifold_dimension("")) of `M`. + is the $(_link(:manifold_dimension; M="")) of `M`. # Keyword arguments -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(2000)`$_sc_any[`StopWhenPopulationConcentrated`](@ref)`()`): +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(2000)`$(_sc(:Any))[`StopWhenPopulationConcentrated`](@ref)`()`)")) a [`StoppingCriterion`](@ref) * `α=1.0`: reflection parameter ``α > 0``: * `γ=2.0` expansion parameter ``γ``: * `ρ=1/2`: contraction parameter, ``0 < ρ ≤ \\frac{1}{2}``, * `σ=1/2`: shrink coefficient, ``0 < σ ≤ 1`` -* $_kw_retraction_method_default: $_kw_retraction_method -* $_kw_inverse_retraction_method_default: $_kw_inverse_retraction_method`inverse_retraction_method=default_inverse_retraction_method(M, typeof(p))`: an inverse retraction to use. +$(_var(:Keyword, :inverse_retraction_method)) +$(_var(:Keyword, :retraction_method)) -$_kw_others +$(_note(:OtherKeywords)) -$_doc_sec_output +$(_note(:OutputSection)) """ @doc "$(_doc_NelderMead)" @@ -285,8 +284,8 @@ function NelderMead!( dmco = decorate_objective!(M, mco; kwargs...) mp = DefaultManoptProblem(M, dmco) s = NelderMeadState( - M, - population; + M; + population=population, stopping_criterion=stopping_criterion, α=α, γ=γ, diff --git a/src/solvers/adaptive_regularization_with_cubics.jl b/src/solvers/adaptive_regularization_with_cubics.jl index d7e74ba230..9bb025b172 100644 --- a/src/solvers/adaptive_regularization_with_cubics.jl +++ b/src/solvers/adaptive_regularization_with_cubics.jl @@ -9,11 +9,11 @@ A state for the [`adaptive_regularization_with_cubics`](@ref) solver. * `γ1`, `γ2`: shrinking and expansion factors for regularization parameter `σ` * `H`: the current Hessian evaluation * `s`: the current solution from the subsolver -* $_field_iterate +$(_var(:Field, :p; add=[:as_Iterate])) * `q`: a point for the candidates to evaluate model and ρ -* $_field_gradient +$(_var(:Field, :X; add=[:as_Gradient])) * `s`: the tangent vector step resulting from minimizing the model - problem in the tangent space ``$(_l_TpM())`` + problem in the tangent space ``$(_math(:TpM))`` * `σ`: the current cubic regularization parameter * `σmin`: lower bound for the cubic regularization parameter * `ρ_regularization`: regularization parameter for computing ρ. @@ -22,19 +22,21 @@ A state for the [`adaptive_regularization_with_cubics`](@ref) solver. * `ρ`: the current regularized ratio of actual improvement and model improvement. * `ρ_denominator`: a value to store the denominator from the computation of ρ to allow for a warning or error when this value is non-positive. -* $_field_retr -* $_field_stop -* $_arg_sub_problem -* $_arg_sub_state +$(_var(:Field, :retraction_method)) +$(_var(:Field, :stopping_criterion, "stop")) +$(_var(:Field, :sub_problem)) +$(_var(:Field, :sub_state)) Furthermore the following integral fields are defined # Constructor - AdaptiveRegularizationState(M, p=rand(M); X=zero_vector(M, p); kwargs...) + AdaptiveRegularizationState(M, sub_problem, sub_state; kwargs...) Construct the solver state with all fields stated as keyword arguments and the following defaults +## Keyword arguments + * `η1=0.1` * `η2=0.9` * `γ1=0.1` @@ -42,12 +44,11 @@ Construct the solver state with all fields stated as keyword arguments and the f * `σ=100/manifold_dimension(M)` * `σmin=1e-7 * `ρ_regularization=1e3` -* $_kw_evaluation_default -* $_kw_retraction_method_default -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(100)` -* `sub_objective=nothing` a shortcut to provide a subobjective. -* `sub_problem=nothing` is set to [`DefaultManoptProblem`](@ref) on the [`TangentSpace`](@extref ManifoldsBase `ManifoldsBase.TangentSpace`) of `p` if an `sub_objecive` is provided -* `sub_state` is set to [`AllocatingEvaluation`](@ref) if `sub_problem` is a function and to a [`LanczosState`](@ref) on the tangent space otherwise +$(_var(:Keyword, :evaluation)) +$(_var(:Keyword, :p)) +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(100)`")) +$(_var(:Keyword, :X)) """ mutable struct AdaptiveRegularizationState{ P, @@ -80,19 +81,10 @@ end function AdaptiveRegularizationState( M::AbstractManifold, + sub_problem::Pr, + sub_state::St; p::P=rand(M), - X::T=zero_vector(M, p); - sub_objective=nothing, - sub_problem::Pr=if isnothing(sub_objective) - nothing - else - DefaultManoptProblem(TangentSpace(M, copy(M, p)), sub_objective) - end, - sub_state::St=if sub_problem isa Function - AllocatingEvaluation() - else - LanczosState(TangentSpace(M, copy(M, p))) - end, + X::T=zero_vector(M, p), σ::R=100.0 / sqrt(manifold_dimension(M)),# Had this to initial value of 0.01. However try same as in MATLAB: 100/sqrt(dim(M)) ρ_regularization::R=1e3, stopping_criterion::SC=StopAfterIteration(100), @@ -106,18 +98,16 @@ function AdaptiveRegularizationState( P, T, R, - Pr<:Union{<:AbstractManoptProblem,<:Function,Nothing}, - St<:Union{<:AbstractManoptSolverState,<:AbstractEvaluationType}, + Pr<:Union{<:AbstractManoptProblem,F} where {F}, + St<:AbstractManoptSolverState, SC<:StoppingCriterion, RTM<:AbstractRetractionMethod, } - isnothing(sub_problem) && error("No sub_problem provided,") - sub_state_storage = maybe_wrap_evaluation_type(sub_state) - return AdaptiveRegularizationState{P,T,Pr,typeof(sub_state_storage),SC,R,RTM}( + return AdaptiveRegularizationState{P,T,Pr,St,SC,R,RTM}( p, X, sub_problem, - sub_state_storage, + sub_state, copy(M, p), copy(M, p, X), copy(M, p, X), @@ -134,7 +124,12 @@ function AdaptiveRegularizationState( γ2, ) end - +function AdaptiveRegularizationState( + M, sub_problem; evaluation::E=AllocatingEvaluation(), kwargs... +) where {E<:AbstractEvaluationType} + cfs = ClosedFormSubSolverState(; evaluation=evaluation) + return AdaptiveRegularizationState(M, sub_problem, cfs; kwargs...) +end get_iterate(s::AdaptiveRegularizationState) = s.p function set_iterate!(s::AdaptiveRegularizationState, p) s.p = p @@ -205,7 +200,7 @@ Solve an optimization problem on the manifold `M` by iteratively minimizing $_doc_ARC_mdoel -on the tangent space at the current iterate ``p_k``, where ``X ∈ $(_l_TpM("p_k"))`` and +on the tangent space at the current iterate ``p_k``, where ``X ∈ $(_math(:TpM; p="p_k"))`` and ``σ_k > 0`` is a regularization parameter. Let ``Xp^{(k)}`` denote the minimizer of the model ``m_k`` and use the model improvement @@ -213,7 +208,7 @@ Let ``Xp^{(k)}`` denote the minimizer of the model ``m_k`` and use the model imp $_doc_ARC_improvement With two thresholds ``η_2 ≥ η_1 > 0`` -set ``p_{k+1} = $(_l_retr)_{p_k}(X_k)`` if ``ρ ≥ η_1`` +set ``p_{k+1} = $(_tex(:retr))_{p_k}(X_k)`` if ``ρ ≥ η_1`` and reject the candidate otherwise, that is, set ``p_{k+1} = p_k``. Further update the regularization parameter using factors ``0 < γ_1 < 1 < γ_2`` reads @@ -224,49 +219,42 @@ For more details see [AgarwalBoumalBullinsCartis:2020](@cite). # Input -$_arg_M -$_arg_f -$_arg_grad_f -$_arg_Hess_f -$_arg_p +$(_var(:Argument, :M; type=true)) +$(_var(:Argument, :f)) +$(_var(:Argument, :grad_f)) +$(_var(:Argument, :Hess_f)) +$(_var(:Argument, :p)) the cost `f` and its gradient and Hessian might also be provided as a [`ManifoldHessianObjective`](@ref) # Keyword arguments -the default values are given in brackets - * `σ=100.0 / sqrt(manifold_dimension(M)`: initial regularization parameter * `σmin=1e-10`: minimal regularization value ``σ_{\\min}`` * `η1=0.1`: lower model success threshold * `η2=0.9`: upper model success threshold * `γ1=0.1`: regularization reduction factor (for the success case) * `γ2=2.0`: regularization increment factor (for the non-success case) -* $_kw_evaluation_default: - $_kw_evaluation +$(_var(:Keyword, :evaluation)) * `initial_tangent_vector=zero_vector(M, p)`: initialize any tangent vector data, * `maxIterLanczos=200`: a shortcut to set the stopping criterion in the sub solver, * `ρ_regularization=1e3`: a regularization to avoid dividing by zero for small values of cost and model -* $_kw_retraction_method_default: - $_kw_retraction_method -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(40)`$_sc_any[`StopWhenGradientNormLess`](@ref)`(1e-9)`$_sc_any[`StopWhenAllLanczosVectorsUsed`](@ref)`(maxIterLanczos)`: - $_kw_stopping_criterion -* $_kw_sub_kwargs_default: - $_kw_sub_kwargs -* `sub_objective=nothing`: $_kw_sub_objective - $(_kw_sub_objective_default_text("AdaptiveRagularizationWithCubicsModelObjective")) -* `sub_state=`[`LanczosState`](@ref)`(M, copy(M, p); maxIterLanczos=maxIterLanczos, σ=σ)`: - a state for the subproblem or an [`AbstractEvaluationType`](@ref) if the problem is a function. -* `sub_problem=`[`DefaultManoptProblem`](@ref)`(M, sub_objective)`: the problem (or a function) for the sub problem - -$_kw_others +$(_var(:Keyword, :retraction_method)): +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(40)`$(_sc(:Any))[`StopWhenGradientNormLess`](@ref)`(1e-9)`$(_sc(:Any))[`StopWhenAllLanczosVectorsUsed`](@ref)`(maxIterLanczos)`")) +$(_var(:Keyword, :sub_kwargs)) +* `sub_objective=nothing`: a shortcut to modify the objective of the subproblem used within in the `sub_problem=` keyword + By default, this is initialized as a [`AdaptiveRagularizationWithCubicsModelObjective`](@ref), which can further be decorated by using the `sub_kwargs=` keyword. +$(_var(:Keyword, :sub_state; default="[`LanczosState`](@ref)`(M, copy(M,p))`")) +$(_var(:Keyword, :sub_problem; default="[`DefaultManoptProblem`](@ref)`(M, sub_objective)`")) + +$(_note(:OtherKeywords)) If you provide the [`ManifoldGradientObjective`](@ref) directly, the `evaluation=` keyword is ignored. The decorations are still applied to the objective. -$_doc_remark_tutorial_debug +$(_note(:TutorialMode)) -$_doc_sec_output +$(_note(:OutputSection)) """ @doc "$_doc_ARC" @@ -286,32 +274,13 @@ function adaptive_regularization_with_cubics( evaluation::AbstractEvaluationType=AllocatingEvaluation(), kwargs..., ) where {TF,TDF,THF} - mho = ManifoldHessianObjective(f, grad_f, Hess_f; evaluation=evaluation) - return adaptive_regularization_with_cubics(M, mho, p; evaluation=evaluation, kwargs...) -end -function adaptive_regularization_with_cubics( - M::AbstractManifold, - f::TF, - grad_f::TDF, - Hess_f::THF, - p::Number; - evaluation::AbstractEvaluationType=AllocatingEvaluation(), - kwargs..., -) where {TF,TDF,THF} - q = [p] - f_(M, p) = f(M, p[]) - Hess_f_ = Hess_f - if evaluation isa AllocatingEvaluation - grad_f_ = (M, p) -> [grad_f(M, p[])] - Hess_f_ = (M, p, X) -> [Hess_f(M, p[], X[])] - else - grad_f_ = (M, X, p) -> (X .= [grad_f(M, p[])]) - Hess_f_ = (M, Y, p, X) -> (Y .= [Hess_f(M, p[], X[])]) - end - rs = adaptive_regularization_with_cubics( - M, f_, grad_f_, Hess_f_, q; evaluation=evaluation, kwargs... - ) - return (typeof(q) == typeof(rs)) ? rs[] : rs + p_ = _ensure_mutating_variable(p) + f_ = _ensure_mutating_cost(f, p) + grad_f_ = _ensure_mutating_gradient(grad_f, p, evaluation) + Hess_f_ = _ensure_mutating_hessian(Hess_f, p, evaluation) + mho = ManifoldHessianObjective(f_, grad_f_, Hess_f_; evaluation=evaluation) + rs = adaptive_regularization_with_cubics(M, mho, p_; evaluation=evaluation, kwargs...) + return _ensure_matching_output(p, rs) end function adaptive_regularization_with_cubics(M::AbstractManifold, f, grad_f; kwargs...) return adaptive_regularization_with_cubics(M, f, grad_f, rand(M); kwargs...) @@ -446,10 +415,10 @@ function adaptive_regularization_with_cubics!( dmp = DefaultManoptProblem(M, dmho) arcs = AdaptiveRegularizationState( M, - p, - X; - sub_state=sub_state_storage, - sub_problem=sub_problem, + sub_problem, + sub_state_storage; + p=p, + X=X, σ=σ, ρ_regularization=ρ_regularization, stopping_criterion=stopping_criterion, @@ -476,10 +445,10 @@ function step_solver!(dmp::AbstractManoptProblem, arcs::AdaptiveRegularizationSt # Set point also in the sub problem (eventually the tangent space) get_gradient!(M, arcs.X, mho, arcs.p) # Update base point in manifold - set_manopt_parameter!(arcs.sub_problem, :Manifold, :p, copy(M, arcs.p)) - set_manopt_parameter!(arcs.sub_problem, :Objective, :σ, arcs.σ) + set_parameter!(arcs.sub_problem, :Manifold, :p, copy(M, arcs.p)) + set_parameter!(arcs.sub_problem, :Objective, :σ, arcs.σ) set_iterate!(arcs.sub_state, M, copy(M, arcs.p, arcs.X)) - set_manopt_parameter!(arcs.sub_state, :σ, arcs.σ) + set_parameter!(arcs.sub_state, :σ, arcs.σ) #Solve the `sub_problem` via dispatch depending on type solve_arc_subproblem!(M, arcs.S, arcs.sub_problem, arcs.sub_state, arcs.p) # Compute ρ diff --git a/src/solvers/alternating_gradient_descent.jl b/src/solvers/alternating_gradient_descent.jl index 5c85d01327..f6f7a6a707 100644 --- a/src/solvers/alternating_gradient_descent.jl +++ b/src/solvers/alternating_gradient_descent.jl @@ -5,21 +5,31 @@ Store the fields for an alternating gradient descent algorithm, see also [`alternating_gradient_descent`](@ref). # Fields -* `direction`: (`AlternatingGradient(zero_vector(M, x))` a [`DirectionUpdateRule`](@ref) -* `evaluation_order`: (`:Linear`) whether to use a randomly permuted sequence (`:FixedRandom`), + +* `direction::`[`DirectionUpdateRule`](@ref) +* `evaluation_order::Symbol`: whether to use a randomly permuted sequence (`:FixedRandom`), a per cycle newly permuted sequence (`:Random`) or the default `:Linear` evaluation order. -* `inner_iterations`: (`5`) how many gradient steps to take in a component before alternating to the next -* `order` the current permutation -* `retraction_method`: (`default_retraction_method(M, typeof(p))`) a `retraction(M,x,ξ)` to use. -* `stepsize`: ([`ConstantStepsize`](@ref)`(M)`) a [`Stepsize`](@ref) -* `stopping_criterion`: ([`StopAfterIteration`](@ref)`(1000)`) a [`StoppingCriterion`](@ref) -* `p`: the current iterate -* `X`: (`zero_vector(M,p)`) the current gradient tangent vector +* `inner_iterations`: how many gradient steps to take in a component before alternating to the next +* `order`: the current permutation +$(_var(:Field, :retraction_method)) +$(_var(:Field, :stepsize)) +$(_var(:Field, :stopping_criterion, "stop")) +$(_var(:Field, :p; add=[:as_Iterate])) +$(_var(:Field, :X; add=[:as_Gradient])) * `k`, ì`: internal counters for the outer and inner iterations, respectively. # Constructors - AlternatingGradientDescentState(M, p; kwargs...) + AlternatingGradientDescentState(M::AbstractManifold; kwargs...) + +# Keyword arguments +* `inner_iterations=5` +$(_var(:Keyword, :p)) +* `order_type::Symbol=:Linear` +* `order::Vector{<:Int}=Int[]` +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(1000)`")) +$(_var(:Keyword, :stepsize; default="[`default_stepsize`](@ref)`(M, AlternatingGradientDescentState)`")) +$(_var(:Keyword, :X)) Generate the options for point `p` and where `inner_iterations`, `order_type`, `order`, `retraction_method`, `stopping_criterion`, and `stepsize`` are keyword arguments @@ -45,8 +55,8 @@ mutable struct AlternatingGradientDescentState{ inner_iterations::Int end function AlternatingGradientDescentState( - M::AbstractManifold, - p::P; + M::AbstractManifold; + p::P=rand(M), X::T=zero_vector(M, p), inner_iterations::Int=5, order_type::Symbol=:Linear, @@ -58,14 +68,14 @@ function AlternatingGradientDescentState( return AlternatingGradientDescentState{ P, T, - AlternatingGradient, + AlternatingGradientRule, typeof(stopping_criterion), typeof(stepsize), typeof(retraction_method), }( p, X, - AlternatingGradient(zero_vector(M, p)), + _produce_type(AlternatingGradient(; p=p, X=X), M), stopping_criterion, stepsize, order_type, @@ -102,28 +112,66 @@ function get_message(agds::AlternatingGradientDescentState) end """ - AlternatingGradient <: DirectionUpdateRule + AlternatingGradientRule <: AbstractGradientGroupDirectionRule + +Create a functor `(problem, state k) -> (s,X)` to evaluate the alternating gradient, +that is alternating between the components of the gradient and has an field for +partial evaluation of the gradient in-place. + +# Fields + +$(_var(:Field, :X)) + +# Constructor + + AlternatingGradientRule(M::AbstractManifold; p=rand(M), X=zero_vector(M, p)) -The default gradient processor, which just evaluates the (alternating) gradient on one of -the components +Initialize the alternating gradient processor with tangent vector type of `X`, +where both `M` and `p` are just help variables. + +# See also +[`alternating_gradient_descent`](@ref), [`AlternatingGradient`])@ref) """ -struct AlternatingGradient{T} <: AbstractGradientGroupProcessor - dir::T +struct AlternatingGradientRule{T} <: AbstractGradientGroupDirectionRule + X::T +end +function AlternatingGradientRule( + M::AbstractManifold; p=rand(M), X::T=zero_vector(M, p) +) where {T} + return AlternatingGradientRule{T}(X) end -function (ag::AlternatingGradient)( - amp::AbstractManoptProblem, agds::AlternatingGradientDescentState, i +function (ag::AlternatingGradientRule)( + amp::AbstractManoptProblem, agds::AlternatingGradientDescentState, k ) M = get_manifold(amp) # at begin of inner iterations reset internal vector to zero - (i == 1) && zero_vector!(M, ag.dir, agds.p) + (k == 1) && zero_vector!(M, ag.X, agds.p) # update order(k)th component in-place - get_gradient!(amp, ag.dir[M, agds.order[agds.k]], agds.p, agds.order[agds.k]) - return agds.stepsize(amp, agds, i), ag.dir # return current full gradient + get_gradient!(amp, ag.X[M, agds.order[agds.k]], agds.p, agds.order[agds.k]) + return agds.stepsize(amp, agds, k), ag.X # return current full gradient +end + +@doc """ + AlternatingGradient(; kwargs...) + AlternatingGradient(M::AbstractManifold; kwargs...) + +Specify that a gradient based method should only update parts of the gradient +in order to do a alternating gradient descent. + +# Keyword arguments + +$(_var(:Keyword, :X, "initial_gradient")) +$(_var(:Keyword, :p; add=:as_Initial)) + +$(_note(:ManifoldDefaultFactory, "AlternatingGradientRule")) +""" +function AlternatingGradient(args...; kwargs...) + return ManifoldDefaultsFactory(Manopt.AlternatingGradientRule, args...; kwargs...) end # update Armijo to work on the kth gradient only. -function (a::ArmijoLinesearch)( +function (a::ArmijoLinesearchStepsize)( amp::AbstractManoptProblem, agds::AlternatingGradientDescentState, ::Int ) M = get_manifold(amp) @@ -144,38 +192,39 @@ function (a::ArmijoLinesearch)( end function default_stepsize(M::AbstractManifold, ::Type{AlternatingGradientDescentState}) - return ArmijoLinesearch(M) + return ArmijoLinesearchStepsize(M) end function alternating_gradient_descent end function alternating_gradient_descent! end -@doc raw""" +_doc_AGD = """ alternating_gradient_descent(M::ProductManifold, f, grad_f, p=rand(M)) alternating_gradient_descent(M::ProductManifold, ago::ManifoldAlternatingGradientObjective, p) + alternating_gradient_descent!(M::ProductManifold, f, grad_f, p) + alternating_gradient_descent!(M::ProductManifold, ago::ManifoldAlternatingGradientObjective, p) -perform an alternating gradient descent +perform an alternating gradient descent. This can be done in-place of the start point `p` # Input -* `M`: the product manifold ``\mathcal M = \mathcal M_1 × \mathcal M_2 × ⋯ ×\mathcal M_n`` -* `f`: the objective function (cost) defined on `M`. +$(_var(:Argument, :M; type=true)) +$(_var(:Argument, :f)) * `grad_f`: a gradient, that can be of two cases - * is a single function returning an `ArrayPartition` or + * is a single function returning an `ArrayPartition` from [`RecursiveArrayTools.jl`](https://docs.sciml.ai/RecursiveArrayTools/stable/array_types/) or * is a vector functions each returning a component part of the whole gradient -* `p`: an initial value ``p_0 ∈ \mathcal M`` +$(_var(:Argument, :p)) -# Optional -* `evaluation`: ([`AllocatingEvaluation`](@ref)) specify whether the gradients work by - allocation (default) form `gradF(M, x)` or [`InplaceEvaluation`](@ref) in place of - the form `gradF!(M, X, x)` (elementwise). -* `evaluation_order`: (`:Linear`) whether to use a randomly permuted sequence (`:FixedRandom`), +# Keyword arguments + +$(_var(:Keyword, :evaluation)) +* `evaluation_order=:Linear`: whether to use a randomly permuted sequence (`:FixedRandom`), a per cycle permuted sequence (`:Random`) or the default `:Linear` one. -* `inner_iterations`: (`5`) how many gradient steps to take in a component before alternating to the next -* `stopping_criterion`: ([`StopAfterIteration`](@ref)`(1000)`) a [`StoppingCriterion`](@ref) -* `stepsize`: ([`ArmijoLinesearch`](@ref)`()`) a [`Stepsize`](@ref) -* `order`: (`[1:n]`) the initial permutation, where `n` is the number of gradients in `gradF`. -* `retraction_method`: (`default_retraction_method(M, typeof(p))`) a `retraction(M, p, X)` to use. +* `inner_iterations=5`: how many gradient steps to take in a component before alternating to the next +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(1000)`)")) +$(_var(:Keyword, :stepsize; default="[`ArmijoLinesearch`](@ref)`()`")) +* `order=[1:n]`: the initial permutation, where `n` is the number of gradients in `gradF`. +$(_var(:Keyword, :retraction_method)) # Output @@ -186,28 +235,12 @@ usually the obtained (approximate) minimizer, see [`get_solver_return`](@ref) fo The input of each of the (component) gradients is still the whole vector `X`, just that all other then the `i`th input component are assumed to be fixed and just the `i`th components gradient is computed / returned. - """ -alternating_gradient_descent(::AbstractManifold, args...; kwargs...) - -@doc raw""" - alternating_gradient_descent!(M::ProductManifold, f, grad_f, p) - alternating_gradient_descent!(M::ProductManifold, ago::ManifoldAlternatingGradientObjective, p) - -perform a alternating gradient descent in place of `p`. - -# Input - -* `M`: a product manifold ``\mathcal M`` -* `f`: the objective functioN (cost) -* `grad_f`: a gradient function, that either returns a vector of the subgradients or is - a vector of gradients -* `p`: an initial value ``p_0 ∈ \mathcal M`` -you can also pass a [`ManifoldAlternatingGradientObjective`](@ref) `ago` containing `f` and `grad_f` instead. +@doc "$(_doc_AGD)" +alternating_gradient_descent(::AbstractManifold, args...; kwargs...) -for all optional parameters, see [`alternating_gradient_descent`](@ref). -""" +@doc "$(_doc_AGD)" alternating_gradient_descent!(M::AbstractManifold, args...; kwargs...) function initialize_solver!( diff --git a/src/solvers/augmented_Lagrangian_method.jl b/src/solvers/augmented_Lagrangian_method.jl index fa5f68f94d..125f973642 100644 --- a/src/solvers/augmented_Lagrangian_method.jl +++ b/src/solvers/augmented_Lagrangian_method.jl @@ -2,7 +2,7 @@ # State # -_sc_alm_default = "[`StopAfterIteration`](@ref)`(300)`$_sc_any([`StopWhenSmallerOrEqual](@ref)`(:ϵ, ϵ_min)`$_sc_all[`StopWhenChangeLess`](@ref)`(1e-10) )$_sc_any[`StopWhenChangeLess`](@ref)`" +_sc_alm_default = "[`StopAfterIteration`](@ref)`(300)`$(_sc(:Any))([`StopWhenSmallerOrEqual](@ref)`(:ϵ, ϵ_min)`$(_sc(:All))[`StopWhenChangeLess`](@ref)`(1e-10) )$(_sc(:Any))[`StopWhenChangeLess`](@ref)`" @doc """ AugmentedLagrangianMethodState{P,T} <: AbstractManoptSolverState @@ -17,32 +17,34 @@ a default value is given in brackets if a parameter can be left out in initializ * `λ`: the Lagrange multiplier with respect to the equality constraints * `λ_max`: an upper bound for the Lagrange multiplier belonging to the equality constraints * `λ_min`: a lower bound for the Lagrange multiplier belonging to the equality constraints -* $_field_p +$(_var(:Field, :p; add=[:as_Iterate])) * `penalty`: evaluation of the current penalty term, initialized to `Inf`. * `μ`: the Lagrange multiplier with respect to the inequality constraints * `μ_max`: an upper bound for the Lagrange multiplier belonging to the inequality constraints * `ρ`: the penalty parameter -* $_field_sub_problem -* $_field_sub_state +$(_var(:Field, :sub_problem)) +$(_var(:Field, :sub_state)) * `τ`: factor for the improvement of the evaluation of the penalty parameter * `θ_ρ`: the scaling factor of the penalty parameter * `θ_ϵ`: the scaling factor of the accuracy tolerance -* $_field_stop +$(_var(:Field, :stopping_criterion, "stop")) # Constructor - AugmentedLagrangianMethodState( - M::AbstractManifold, - co::ConstrainedManifoldObjective, - p, - sub_problem, - sub_state; - kwargs... + AugmentedLagrangianMethodState(M::AbstractManifold, co::ConstrainedManifoldObjective, + sub_problem, sub_state; kwargs... ) -construct an augmented Lagrangian method options, where $(_arg_inline_M) and the [`ConstrainedManifoldObjective`](@ref) `co` are used for +construct an augmented Lagrangian method options, where the manifold `M` and the [`ConstrainedManifoldObjective`](@ref) `co` are used for manifold- or objective specific defaults. + AugmentedLagrangianMethodState(M::AbstractManifold, co::ConstrainedManifoldObjective, + sub_problem; evaluation=AllocatingEvaluation(), kwargs... + ) + +construct an augmented Lagrangian method options, where the manifold `M` and the [`ConstrainedManifoldObjective`](@ref) `co` are used for +manifold- or objective specific defaults, and `sub_problem` is a closed form solution with `evaluation` as type of evaluation. + ## Keyword arguments the following keyword arguments are available to initialise the corresponding fields @@ -54,6 +56,7 @@ the following keyword arguments are available to initialise the corresponding fi * `λ_min=- λ_max` * `μ=ones(m)`: `m` is the number of inequality constraints in the [`ConstrainedManifoldObjective`](@ref) `co`. * `μ_max=20.0` +$(_var(:Keyword, :p; add=:as_Initial)) * `ρ=1.0` * `τ=0.8` * `θ_ρ=0.3` @@ -92,9 +95,9 @@ mutable struct AugmentedLagrangianMethodState{ function AugmentedLagrangianMethodState( M::AbstractManifold, co::ConstrainedManifoldObjective, - p::P, sub_problem::Pr, - sub_state::Union{AbstractEvaluationType,AbstractManoptSolverState}; + sub_state::St; + p::P=rand(M), ϵ::R=1e-3, ϵ_min::R=1e-6, λ::V=ones(length(get_equality_constraint(M, co, p, :))), @@ -110,16 +113,22 @@ mutable struct AugmentedLagrangianMethodState{ stopping_criterion::SC=StopAfterIteration(300) | ( StopWhenSmallerOrEqual(:ϵ, ϵ_min) & - StopWhenChangeLess(1e-10) + StopWhenChangeLess(M, 1e-10) ) | - StopWhenChangeLess(1e-10), + StopWhenChangeLess(M, 1e-10), kwargs..., - ) where {P,Pr<:Union{F,AbstractManoptProblem} where {F},R<:Real,V,SC<:StoppingCriterion} - sub_state_storage = maybe_wrap_evaluation_type(sub_state) - alms = new{P,Pr,typeof(sub_state_storage),R,V,SC}() + ) where { + P, + Pr<:Union{F,AbstractManoptProblem} where {F}, + St<:AbstractManoptSolverState, + R<:Real, + V, + SC<:StoppingCriterion, + } + alms = new{P,Pr,St,R,V,SC}() alms.p = p alms.sub_problem = sub_problem - alms.sub_state = sub_state_storage + alms.sub_state = sub_state alms.ϵ = ϵ alms.ϵ_min = ϵ_min alms.λ_max = λ_max @@ -137,6 +146,16 @@ mutable struct AugmentedLagrangianMethodState{ return alms end end +function AugmentedLagrangianMethodState( + M::AbstractManifold, + co::ConstrainedManifoldObjective, + sub_problem; + evaluation::E=AllocatingEvaluation(), + kwargs..., +) where {E<:AbstractEvaluationType} + cfs = ClosedFormSubSolverState(; evaluation=evaluation) + return AugmentedLagrangianMethodState(M, co, sub_problem, cfs; kwargs...) +end get_iterate(alms::AugmentedLagrangianMethodState) = alms.p function set_iterate!(alms::AugmentedLagrangianMethodState, M, p) @@ -212,12 +231,12 @@ This method can work in-place of `p`. The aim of the ALM is to find the solution of the constrained optimisation task -$_problem_constrained +$_problem(:Constrained) -where `M` is a Riemannian manifold, and ``f``, ``$(_math_sequence("g", "i", "1", "n"))`` and ``$(_math_sequence("h", "j", "1", "m")) +where `M` is a Riemannian manifold, and ``f``, ``$(_math(:Sequence, "g", "i", "1", "n"))`` and ``$(_math(:Sequence, "h", "j", "1", "m")) are twice continuously differentiable functions from `M` to ℝ. In every step ``k`` of the algorithm, the [`AugmentedLagrangianCost`](@ref) - ``$(_doc_al_Cost("k"))`` is minimized on $_l_M, + ``$(_doc_AL_Cost("k"))`` is minimized on $(_tex(:Cal, "M")), where ``μ^{(k)} ∈ ℝ^n`` and ``λ^{(k)} ∈ ℝ^m`` are the current iterates of the Lagrange multipliers and ``ρ^{(k)}`` is the current penalty parameter. The Lagrange multipliers are then updated by @@ -228,13 +247,13 @@ and $_doc_alm_μ_update - where ``λ_{$_l_min} ≤ λ_{$_l_max}`` and ``μ_{$_l_max}`` are the multiplier boundaries. + where ``λ_{$(_tex(:text, "min"))} ≤ λ_{$(_tex(:text, "max"))}`` and ``μ_{$(_tex(:text, "max"))}`` are the multiplier boundaries. Next, the accuracy tolerance ``ϵ`` is updated as $_doc_alm_ε_update - where ``ϵ_{$_l_min}`` is the lowest value ``ϵ`` is allowed to become and ``θ_ϵ ∈ (0,1)`` is constant scaling factor. + where ``ϵ_{$(_tex(:text, "min"))}`` is the lowest value ``ϵ`` is allowed to become and ``θ_ϵ ∈ (0,1)`` is constant scaling factor. Last, the penalty parameter ``ρ`` is updated as follows: with @@ -248,9 +267,9 @@ where ``θ_ρ ∈ (0,1)`` is a constant scaling factor. # Input -$_arg_M -$_arg_f -$_arg_grad_f +$(_var(:Argument, :M; type=true)) +$(_var(:Argument, :f)) +$(_var(:Argument, :grad_f)) # Optional (if not called with the [`ConstrainedManifoldObjective`](@ref) `cmo`) @@ -264,8 +283,7 @@ Otherwise the problem is not constrained and a better solver would be for exampl # Keyword Arguments -* $_kw_evaluation_default: $_kw_evaluation - +$(_var(:Keyword, :evaluation)) * `ϵ=1e-3`: the accuracy tolerance * `ϵ_min=1e-6`: the lower bound for the accuracy tolerance * `ϵ_exponent=1/100`: exponent of the ϵ update factor; @@ -298,26 +316,27 @@ Otherwise the problem is not constrained and a better solver would be for exampl * `θ_ϵ=(ϵ_min / ϵ)^(ϵ_exponent)`: the scaling factor of the exactness * `sub_cost=[`AugmentedLagrangianCost± (@ref)`(cmo, ρ, μ, λ):` use augmented Lagrangian cost, based on the [`ConstrainedManifoldObjective`](@ref) build from the functions provided. - $(_kw_used_in("sub_problem")) + $(_note(:KeywordUsedIn, "sub_problem")) * `sub_grad=[`AugmentedLagrangianGrad`](@ref)`(cmo, ρ, μ, λ)`: use augmented Lagrangian gradient, based on the [`ConstrainedManifoldObjective`](@ref) build from the functions provided. - $(_kw_used_in("sub_problem")) + $(_note(:KeywordUsedIn, "sub_problem")) -* $_kw_sub_kwargs_default: $_kw_sub_kwargs +$(_var(:Keyword, :sub_kwargs)) -* `sub_problem=`[`DefaultManoptProblem`](@ref)`(M, `[`ConstrainedManifoldObjective`](@ref)`(subcost, subgrad; evaluation=evaluation))`: - problem for the subsolver -* `sub_state=`[`QuasiNewtonState`](@ref)) using [`QuasiNewtonLimitedMemoryDirectionUpdate`](@ref) with [`InverseBFGS`](@ref) and `sub_stopping_criterion` as a stopping criterion. - See also `sub_kwargs=`. +$(_var(:Keyword, :stopping_criterion; default= _sc_alm_default)) +$(_var(:Keyword, :sub_problem; default="[`DefaultManoptProblem`](@ref)`(M, sub_objective)`")) +$(_var(:Keyword, :sub_state; default="[`QuasiNewtonState`](@ref)", add="as the quasi newton method, the [`QuasiNewtonLimitedMemoryDirectionUpdate`](@ref) with [`InverseBFGS`](@ref) is used.")) +* `sub_stopping_criterion::StoppingCriterion=StopAfterIteration(300) | + StopWhenGradientNormLess(ϵ) | + StopWhenStepsizeLess(1e-8), -* `stopping_criterion=$_sc_alm_default`: $_kw_stopping_criterion For the `range`s of the constraints' gradient, other power manifold tangent space representations, mainly the [`ArrayPowerRepresentation`](@extref Manifolds :jl:type:`Manifolds.ArrayPowerRepresentation`) can be used if the gradients can be computed more efficiently in that representation. -$(_kw_others) +$(_note(:OtherKeywords)) -$_doc_sec_output +$(_note(:OutputSection)) """ @doc "$(_doc_alm)" @@ -331,33 +350,33 @@ function augmented_Lagrangian_method( h=nothing, grad_g=nothing, grad_h=nothing, - inequality_constrains::Union{Integer,Nothing}=nothing, - equality_constrains::Union{Nothing,Integer}=nothing, + inequality_constraints::Union{Integer,Nothing}=nothing, + equality_constraints::Union{Nothing,Integer}=nothing, kwargs..., ) - q = copy(M, p) + p_ = _ensure_mutating_variable(p) + f_ = _ensure_mutating_cost(f, p) + grad_f_ = _ensure_mutating_gradient(grad_f, p, evaluation) + g_ = _ensure_mutating_cost(g, p) + grad_g_ = _ensure_mutating_gradient(grad_g, p, evaluation) + h_ = _ensure_mutating_cost(h, p) + grad_h_ = _ensure_mutating_gradient(grad_h, p, evaluation) + cmo = ConstrainedManifoldObjective( - f, - grad_f, - g, - grad_g, - h, - grad_h; + f_, + grad_f_, + g_, + grad_g_, + h_, + grad_h_; evaluation=evaluation, - inequality_constrains=inequality_constrains, - equality_constrains=equality_constrains, + inequality_constraints=inequality_constraints, + equality_constraints=equality_constraints, M=M, p=p, ) - return augmented_Lagrangian_method!( - M, - cmo, - q; - evaluation=evaluation, - equality_constrains=equality_constrains, - inequality_constrains=inequality_constrains, - kwargs..., - ) + rs = augmented_Lagrangian_method(M, cmo, p_; evaluation=evaluation, kwargs...) + return _ensure_matching_output(p, rs) end function augmented_Lagrangian_method( M::AbstractManifold, cmo::O, p=rand(M); kwargs... @@ -365,31 +384,6 @@ function augmented_Lagrangian_method( q = copy(M, p) return augmented_Lagrangian_method!(M, cmo, q; kwargs...) end -function augmented_Lagrangian_method( - M::AbstractManifold, - f::TF, - grad_f::TGF, - p::Number; - evaluation::AbstractEvaluationType=AllocatingEvaluation(), - g=nothing, - grad_g=nothing, - grad_h=nothing, - h=nothing, - kwargs..., -) where {TF,TGF} - q = [p] - f_(M, p) = f(M, p[]) - grad_f_ = _to_mutating_gradient(grad_f, evaluation) - g_ = isnothing(g) ? nothing : (M, p) -> g(M, p[]) - grad_g_ = isnothing(grad_g) ? nothing : _to_mutating_gradient(grad_g, evaluation) - h_ = isnothing(h) ? nothing : (M, p) -> h(M, p[]) - grad_h_ = isnothing(grad_h) ? nothing : _to_mutating_gradient(grad_h, evaluation) - cmo = ConstrainedManifoldObjective( - f_, grad_f_, g_, grad_g_, h_, grad_h_; evaluation=evaluation, M=M, p=p - ) - rs = augmented_Lagrangian_method(M, cmo, q; evaluation=evaluation, kwargs...) - return (typeof(q) == typeof(rs)) ? rs[] : rs -end @doc "$(_doc_alm)" function augmented_Lagrangian_method!( @@ -402,15 +396,15 @@ function augmented_Lagrangian_method!( h=nothing, grad_g=nothing, grad_h=nothing, - inequality_constrains=nothing, - equality_constrains=nothing, + inequality_constraints=nothing, + equality_constraints=nothing, kwargs..., ) where {TF,TGF} - if isnothing(inequality_constrains) - inequality_constrains = _number_of_constraints(g, grad_g; M=M, p=p) + if isnothing(inequality_constraints) + inequality_constraints = _number_of_constraints(g, grad_g; M=M, p=p) end - if isnothing(equality_constrains) - equality_constrains = _number_of_constraints(h, grad_h; M=M, p=p) + if isnothing(equality_constraints) + equality_constraints = _number_of_constraints(h, grad_h; M=M, p=p) end cmo = ConstrainedManifoldObjective( f, @@ -420,8 +414,8 @@ function augmented_Lagrangian_method!( h, grad_h; evaluation=evaluation, - equality_constrains=equality_constrains, - inequality_constrains=inequality_constrains, + equality_constraints=equality_constraints, + inequality_constraints=inequality_constraints, M=M, p=p, ) @@ -431,8 +425,8 @@ function augmented_Lagrangian_method!( dcmo, p; evaluation=evaluation, - equality_constrains=equality_constrains, - inequality_constrains=inequality_constrains, + equality_constraints=equality_constraints, + inequality_constraints=inequality_constraints, kwargs..., ) end @@ -465,9 +459,9 @@ function augmented_Lagrangian_method!( StopWhenStepsizeLess(1e-8), sub_state::AbstractManoptSolverState=decorate_state!( QuasiNewtonState( - M, - copy(p); - initial_vector=zero_vector(M, p), + M; + p=copy(M, p), + X=zero_vector(M, p), direction_update=QuasiNewtonLimitedMemoryDirectionUpdate( M, copy(M, p), InverseBFGS(), min(manifold_dimension(M), 30) ), @@ -490,17 +484,18 @@ function augmented_Lagrangian_method!( stopping_criterion::StoppingCriterion=StopAfterIteration(300) | ( StopWhenSmallerOrEqual(:ϵ, ϵ_min) & - StopWhenChangeLess(1e-10) + StopWhenChangeLess(M, 1e-10) ) | StopWhenStepsizeLess(1e-10), kwargs..., ) where {O<:Union{ConstrainedManifoldObjective,AbstractDecoratedManifoldObjective}} + sub_state_storage = maybe_wrap_evaluation_type(sub_state) alms = AugmentedLagrangianMethodState( M, cmo, - p, sub_problem, - sub_state; + sub_state_storage; + p=p, ϵ=ϵ, ϵ_min=ϵ_min, λ_max=λ_max, @@ -540,15 +535,15 @@ end function step_solver!(mp::AbstractManoptProblem, alms::AugmentedLagrangianMethodState, iter) M = get_manifold(mp) # use subsolver to minimize the augmented Lagrangian - set_manopt_parameter!(alms.sub_problem, :Objective, :Cost, :ρ, alms.ρ) - set_manopt_parameter!(alms.sub_problem, :Objective, :Cost, :μ, alms.μ) - set_manopt_parameter!(alms.sub_problem, :Objective, :Cost, :λ, alms.λ) - set_manopt_parameter!(alms.sub_problem, :Objective, :Gradient, :ρ, alms.ρ) - set_manopt_parameter!(alms.sub_problem, :Objective, :Gradient, :μ, alms.μ) - set_manopt_parameter!(alms.sub_problem, :Objective, :Gradient, :λ, alms.λ) + set_parameter!(alms.sub_problem, :Objective, :Cost, :ρ, alms.ρ) + set_parameter!(alms.sub_problem, :Objective, :Cost, :μ, alms.μ) + set_parameter!(alms.sub_problem, :Objective, :Cost, :λ, alms.λ) + set_parameter!(alms.sub_problem, :Objective, :Gradient, :ρ, alms.ρ) + set_parameter!(alms.sub_problem, :Objective, :Gradient, :μ, alms.μ) + set_parameter!(alms.sub_problem, :Objective, :Gradient, :λ, alms.λ) set_iterate!(alms.sub_state, M, copy(M, alms.p)) - update_stopping_criterion!(alms, :MinIterateChange, alms.ϵ) + set_parameter!(alms, :StoppingCriterion, :MinIterateChange, alms.ϵ) new_p = get_solver_result(solve!(alms.sub_problem, alms.sub_state)) alms.last_stepsize = distance(M, alms.p, new_p, default_inverse_retraction_method(M)) diff --git a/src/solvers/cma_es.jl b/src/solvers/cma_es.jl index 2b97d8fe99..c481ca6c12 100644 --- a/src/solvers/cma_es.jl +++ b/src/solvers/cma_es.jl @@ -1,14 +1,14 @@ # # State # -@doc raw""" +@doc """ CMAESState{P,T} <: AbstractManoptSolverState State of covariance matrix adaptation evolution strategy. # Fields -* `p` the best point found so far +$(_var(:Field, :p; add=" storing the best point found so far")) * `p_obj` objective value at `p` * `μ` parent number * `λ` population size @@ -19,7 +19,6 @@ State of covariance matrix adaptation evolution strategy. * `c_σ` decay rate for the cumulation path for the step-size control * `c_m` learning rate for the mean * `d_σ` damping parameter for step-size update -* `stop` stopping criteria, [`StoppingCriterion`](@ref) * `population` population of the current generation * `ys_c` coordinates of random vectors for the current generation * `covariance_matrix` coordinates of the covariance matrix @@ -30,14 +29,15 @@ State of covariance matrix adaptation evolution strategy. * `worst_fitness_current_gen` worst fitness value of individuals in the current generation * `p_m` point around which the search for new candidates is done * `σ` step size -* `p_σ` coordinates of a vector in ``T_{p_m} \mathcal M`` -* `p_c` coordinates of a vector in ``T_{p_m} \mathcal M`` +* `p_σ` coordinates of a vector in ``$(_math(:TpM; p="p_m"))`` +* `p_c` coordinates of a vector in ``$(_math(:TpM; p="p_m"))`` * `deviations` standard deviations of coordinate RNG * `buffer` buffer for random number generation and `wmean_y_c` of length `n_coords` * `e_mv_norm` expected value of norm of the `n_coords`-variable standard normal distribution * `recombination_weights` recombination weights used for updating covariance matrix -* `retraction_method` an `AbstractRetractionMethod` -* `vector_transport_method` a vector transport to use +$(_var(:Field, :retraction_method)) +$(_var(:Field, :stopping_criterion, "stop")) +$(_var(:Field, :vector_transport_method)) * `basis` a real coefficient basis for covariance matrix * `rng` RNG for generating new points @@ -347,12 +347,12 @@ setting. # Input -* `M`: a manifold ``$(_l_M) M`` -* `f`: a cost function ``f: $(_l_M)→ℝ`` to find a minimizer ``p^*`` for +* `M`: a manifold ``$(_math(:M))`` +* `f`: a cost function ``f: $(_math(:M))→ℝ`` to find a minimizer ``p^*`` for -# Optional +# Keyword arguments -* `p_m=`$(_link_rand()): an initial point `p` +* `p_m=`$(Manopt._link(:rand)): an initial point `p` * `σ=1.0`: initial standard deviation * `λ`: (`4 + Int(floor(3 * log(manifold_dimension(M))))`population size (can be increased for a more thorough global search but decreasing is not recommended) @@ -361,16 +361,16 @@ setting. * `tol_x=1e-12`: tolerance for the `StopWhenPopulationStronglyConcentrated`, similar to absolute difference between subsequent point but actually computed from distribution parameters. -* `stopping_criterion=default_cma_es_stopping_criterion(M, λ; tol_fun=tol_fun, tol_x=tol_x)`: -* `retraction_method=default_retraction_method(M, typeof(p_m))`: -* `vector_transport_method=default_vector_transport_method(M, typeof(p_m))`: +$(_var(:Keyword, :stopping_criterion; default="`default_cma_es_stopping_criterion(M, λ; tol_fun=tol_fun, tol_x=tol_x)`")) +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :vector_transport_method)) * `basis` (`DefaultOrthonormalBasis()`) basis used to represent covariance in * `rng=default_rng()`: random number generator for generating new points on `M` -$(_kw_others) +$(_note(:OtherKeywords)) -$(_doc_sec_output) +$(_note(:OutputSection)) """ function cma_es(M::AbstractManifold, f; kwargs...) mco = ManifoldCostObjective(f) diff --git a/src/solvers/conjugate_gradient_descent.jl b/src/solvers/conjugate_gradient_descent.jl index 55af6db4f9..2ecc146fa1 100644 --- a/src/solvers/conjugate_gradient_descent.jl +++ b/src/solvers/conjugate_gradient_descent.jl @@ -4,7 +4,9 @@ function default_stepsize( retraction_method=default_retraction_method(M), ) # take a default with a slightly defensive initial step size. - return ArmijoLinesearch(M; retraction_method=retraction_method, initial_stepsize=1.0) + return ArmijoLinesearchStepsize( + M; retraction_method=retraction_method, initial_stepsize=1.0 + ) end function show(io::IO, cgds::ConjugateGradientDescentState) i = get_count(cgds, :Iterations) @@ -35,7 +37,7 @@ p_{k+1} = \operatorname{retr}_{p_k} \bigl( s_kδ_k \bigr), """ _doc_update_delta_k = raw""" ````math -\delta_k=\operatorname{grad}f(p_k) + β_k \delta_{k-1} +δ_k=\operatorname{grad}f(p_k) + β_k \delta_{k-1} ```` """ @@ -49,19 +51,19 @@ perform a conjugate gradient based descent- $(_doc_CG_formula) -where ``$(_l_retr)`` denotes a retraction on the `Manifold` `M` +where ``$(_tex(:retr))`` denotes a retraction on the `Manifold` `M` and one can employ different rules to update the descent direction ``δ_k`` based on -the last direction ``δ_{k-1}`` and both gradients ``$(_l_grad)f(x_k)``,``$(_l_grad) f(x_{k-1})``. +the last direction ``δ_{k-1}`` and both gradients ``$(_tex(:grad))f(x_k)``,``$(_tex(:grad)) f(x_{k-1})``. The [`Stepsize`](@ref) ``s_k`` may be determined by a [`Linesearch`](@ref). Alternatively to `f` and `grad_f` you can provide the [`AbstractManifoldGradientObjective`](@ref) `gradient_objective` directly. -Available update rules are [`SteepestDirectionUpdateRule`](@ref), which yields a [`gradient_descent`](@ref), -[`ConjugateDescentCoefficient`](@ref) (the default), [`DaiYuanCoefficient`](@ref), [`FletcherReevesCoefficient`](@ref), +Available update rules are [`SteepestDescentCoefficientRule`](@ref), which yields a [`gradient_descent`](@ref), +[`ConjugateDescentCoefficient`](@ref) (the default), [`DaiYuanCoefficientRule`](@ref), [`FletcherReevesCoefficient`](@ref), [`HagerZhangCoefficient`](@ref), [`HestenesStiefelCoefficient`](@ref), [`LiuStoreyCoefficient`](@ref), and [`PolakRibiereCoefficient`](@ref). -These can all be combined with a [`ConjugateGradientBealeRestart`](@ref) rule. +These can all be combined with a [`ConjugateGradientBealeRestartRule`](@ref) rule. They all compute ``β_k`` such that this algorithm updates the search direction as @@ -69,29 +71,27 @@ $(_doc_update_delta_k) # Input -$(_arg_M) -$(_arg_f) -$(_arg_grad_f) -$(_arg_p) +$(_var(:Argument, :M; type=true)) +$(_var(:Argument, :f)) +$(_var(:Argument, :grad_f)) +$(_var(:Argument, :p)) # Keyword arguments -* `coefficient::DirectionUpdateRule=[`ConjugateDescentCoefficient`](@ref)`()`: +* `coefficient::DirectionUpdateRule=`[`ConjugateDescentCoefficient`](@ref)`()`: rule to compute the descent direction update coefficient ``β_k``, as a functor, where the resulting function maps are `(amp, cgs, k) -> β` with `amp` an [`AbstractManoptProblem`](@ref), - `cgs` is the [`ConjugateGradientDescentState`](@ref), and `i` is the current iterate. -* $(_kw_evaluation_default): $(_kw_evaluation) -* $(_kw_retraction_method_default): $(_kw_retraction_method) -* `stepsize=[`ArmijoLinesearch`](@ref)`(M)`: $_kw_stepsize - via [`default_stepsize`](@ref)) passing on the `default_retraction_method` -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(500)`$(_sc_any)[`StopWhenGradientNormLess`](@ref)`(1e-8)`: - $(_kw_stopping_criterion) -* $(_kw_vector_transport_method_default): $(_kw_vector_transport_method) + `cgs` is the [`ConjugateGradientDescentState`](@ref), and `k` is the current iterate. +$(_var(:Keyword, :evaluation)) +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :stepsize; default="[`ArmijoLinesearch`](@ref)`()`")) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(500)`$(_sc(:Any))[`StopWhenGradientNormLess`](@ref)`(1e-8)`")) +$(_var(:Keyword, :vector_transport_method)) If you provide the [`ManifoldGradientObjective`](@ref) directly, the `evaluation=` keyword is ignored. The decorations are still applied to the objective. -$(_doc_sec_output) +$(_note(:OutputSection)) """ @doc "$(_doc_CG)" @@ -102,24 +102,12 @@ end function conjugate_gradient_descent( M::AbstractManifold, f::TF, grad_f::TDF, p; evaluation=AllocatingEvaluation(), kwargs... ) where {TF,TDF} - mgo = ManifoldGradientObjective(f, grad_f; evaluation=evaluation) - return conjugate_gradient_descent(M, mgo, p; evaluation=evaluation, kwargs...) -end -function conjugate_gradient_descent( - M::AbstractManifold, - f::TF, - grad_f::TDF, - p::Number; - evaluation::AbstractEvaluationType=AllocatingEvaluation(), - kwargs..., -) where {TF,TDF} - # redefine initial point - q = [p] - f_(M, p) = f(M, p[]) - grad_f_ = _to_mutating_gradient(grad_f, evaluation) - rs = conjugate_gradient_descent(M, f_, grad_f_, q; evaluation=evaluation, kwargs...) - #return just a number if the return type is the same as the type of q - return (typeof(q) == typeof(rs)) ? rs[] : rs + p_ = _ensure_mutating_variable(p) + f_ = _ensure_mutating_cost(f, p) + grad_f_ = _ensure_mutating_gradient(grad_f, p, evaluation) + mgo = ManifoldGradientObjective(f_, grad_f_; evaluation=evaluation) + rs = conjugate_gradient_descent(M, mgo, p_; evaluation=evaluation, kwargs...) + return _ensure_matching_output(p, rs) end function conjugate_gradient_descent( M::AbstractManifold, mgo::O, p=rand(M); kwargs... @@ -146,9 +134,9 @@ function conjugate_gradient_descent!( M::AbstractManifold, mgo::O, p; - coefficient::DirectionUpdateRule=ConjugateDescentCoefficient(), + coefficient::Union{DirectionUpdateRule,ManifoldDefaultsFactory}=ConjugateDescentCoefficient(), retraction_method::AbstractRetractionMethod=default_retraction_method(M, typeof(p)), - stepsize::Stepsize=default_stepsize( + stepsize::Union{Stepsize,ManifoldDefaultsFactory}=default_stepsize( M, ConjugateGradientDescentState; retraction_method=retraction_method ), stopping_criterion::StoppingCriterion=StopAfterIteration(500) | @@ -160,11 +148,11 @@ function conjugate_gradient_descent!( dmgo = decorate_objective!(M, mgo; kwargs...) dmp = DefaultManoptProblem(M, dmgo) cgs = ConjugateGradientDescentState( - M, - p; + M; + p=p, stopping_criterion=stopping_criterion, - stepsize=stepsize, - coefficient=coefficient, + stepsize=_produce_type(stepsize, M), + coefficient=_produce_type(coefficient, M), retraction_method=retraction_method, vector_transport_method=vector_transport_method, initial_gradient=initial_gradient, diff --git a/src/solvers/conjugate_residual.jl b/src/solvers/conjugate_residual.jl index 2245f5d145..59fc51ce59 100644 --- a/src/solvers/conjugate_residual.jl +++ b/src/solvers/conjugate_residual.jl @@ -1,53 +1,50 @@ -@doc """ - conjugate_residual(TpM::TangentSpace, A, b, p=rand(TpM)) - conjugate_residual(TpM::TangentSpace, slso::SymmetricLinearSystemObjective, p=rand(TpM)) - conjugate_residual!(TpM::TangentSpace, A, b, p) - conjugate_residual!(TpM::TangentSpace, slso::SymmetricLinearSystemObjective, p) +_doc_conjugate_residual = """ + conjugate_residual(TpM::TangentSpace, A, b, X=zero_vector(TpM)) + conjugate_residual(TpM::TangentSpace, slso::SymmetricLinearSystemObjective, X=zero_vector(TpM)) + conjugate_residual!(TpM::TangentSpace, A, b, X) + conjugate_residual!(TpM::TangentSpace, slso::SymmetricLinearSystemObjective, X) -Compute the solution of ``$(_l_cal("A"))(p)[X] + b(p) = 0_p ``, where +Compute the solution of ``$(_tex(:Cal, "A"))(p)[X] + b(p) = 0_p ``, where -* ``$(_l_cal("A"))`` is a linear, symmetric operator on ``$(_l_TpM)`` +* ``$(_tex(:Cal, "A"))`` is a linear, symmetric operator on ``$(_math(:TpM))`` * ``b`` is a vector field on the manifold -* ``X ∈ $(_l_TpM)`` is a tangent vector -* ``0_p`` is the zero vector ``$(_l_TpM)``. +* ``X ∈ $(_math(:TpM))`` is a tangent vector +* ``0_p`` is the zero vector ``$(_math(:TpM))``. This implementation follows Algorithm 3 in [LaiYoshise:2024](@cite) and is initalised with ``X^{(0)}`` as the zero vector and -* the initial residual ``r^{(0)} = -b(p) - $(_l_cal("A"))(p)[X^{(0)}]`` +* the initial residual ``r^{(0)} = -b(p) - $(_tex(:Cal, "A"))(p)[X^{(0)}]`` * the initial conjugate direction ``d^{(0)} = r^{(0)}`` -* initialize ``Y^{(0)} = $(_l_cal("A"))(p)[X^{(0)}]`` +* initialize ``Y^{(0)} = $(_tex(:Cal, "A"))(p)[X^{(0)}]`` performed the following steps at iteration ``k=0,…`` until the `stopping_criterion` is fulfilled. -1. compute a step size ``α_k = $(_l_ds)$(_l_frac("⟨ r^{(k)}, $(_l_cal("A"))(p)[r^{(k)}] ⟩_p","⟨ $(_l_cal("A"))(p)[d^{(k)}], $(_l_cal("A"))(p)[d^{(k)}] ⟩_p"))`` +1. compute a step size ``α_k = $(_tex(:displaystyle))$(_tex(:frac, "⟨ r^{(k)}, $(_tex(:Cal, "A"))(p)[r^{(k)}] ⟩_p","⟨ $(_tex(:Cal, "A"))(p)[d^{(k)}], $(_tex(:Cal, "A"))(p)[d^{(k)}] ⟩_p"))`` 2. do a step ``X^{(k+1)} = X^{(k)} + α_kd^{(k)}`` 2. update the residual ``r^{(k+1)} = r^{(k)} + α_k Y^{(k)}`` -4. compute ``Z = $(_l_cal("A"))(p)[r^{(k+1)}]`` -5. Update the conjugate coefficient ``β_k = $(_l_ds)$(_l_frac("⟨ r^{(k+1)}, $(_l_cal("A"))(p)[r^{(k+1)}] ⟩_p", "⟨ r^{(k)}, $(_l_cal("A"))(p)[r^{(k)}] ⟩_p"))`` +4. compute ``Z = $(_tex(:Cal, "A"))(p)[r^{(k+1)}]`` +5. Update the conjugate coefficient ``β_k = $(_tex(:displaystyle))$(_tex(:frac, "⟨ r^{(k+1)}, $(_tex(:Cal, "A"))(p)[r^{(k+1)}] ⟩_p", "⟨ r^{(k)}, $(_tex(:Cal, "A"))(p)[r^{(k)}] ⟩_p"))`` 6. Update the conjugate direction ``d^{(k+1)} = r^{(k+1)} + β_kd^{(k)}`` 7. Update ``Y^{(k+1)} = -Z + β_k Y^{(k)}`` -Note that the right hand side of Step 7 is the same as evaluating ``$(_l_cal("A"))[d^{(k+1)}]``, but avoids the actual evaluation +Note that the right hand side of Step 7 is the same as evaluating ``$(_tex(:Cal, "A"))[d^{(k+1)}]``, but avoids the actual evaluation # Input * `TpM` the [`TangentSpace`](@extref `ManifoldsBase.TangentSpace`) as the domain * `A` a symmetric linear operator on the tangent space `(M, p, X) -> Y` * `b` a vector field on the tangent space `(M, p) -> X` - +* `X` the initial tangent vector # Keyword arguments -* `evaluation=`[`AllocatingEvaluation`](@ref) specify whether `A` and `b` are implemented allocating or in-place -* `stopping_criterion::`[`StoppingCriterion`](@ref)`=`[`StopAfterIteration`](@ref)`(`$(_link_manifold_dimension())$_sc_any[`StopWhenRelativeResidualLess`](@ref)`(c,1e-8)`, - where `c` is the norm of ``$(_l_norm("b"))``. - -# Output - -the obtained (approximate) minimizer ``X^*``. -To obtain the whole final state of the solver, see [`get_solver_return`](@ref) for details. +$(_var(:Keyword, :evaluation)) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(`$(_link(:manifold_dimension))$(_sc(:Any))[`StopWhenRelativeResidualLess`](@ref)`(c,1e-8)`, where `c` is ``$(_tex(:norm,"b"))``")) +$(_note(:OutputSection)) """ + +@doc "$_doc_conjugate_residual" conjugate_residual(TpM::TangentSpace, args...; kwargs...) function conjugate_residual( @@ -68,6 +65,9 @@ function conjugate_residual( return conjugate_residual!(TpM, slso, Y; kwargs...) end +@doc "$_doc_conjugate_residual" +conjugate_residual!(TpM::TangentSpace, args...; kwargs...) + function conjugate_residual!( TpM::TangentSpace, slso::SymmetricLinearSystemObjective, diff --git a/src/solvers/convex_bundle_method.jl b/src/solvers/convex_bundle_method.jl index 93155be188..4a19e036f2 100644 --- a/src/solvers/convex_bundle_method.jl +++ b/src/solvers/convex_bundle_method.jl @@ -15,30 +15,36 @@ point type `P` and a tangent vector type `T`` * `diameter::R`: estimate for the diameter of the level set of the objective function at the starting point * `domain: the domain of ``f`` as a function `(M,p) -> b`that evaluates to true when the current candidate is in the domain of `f`, and false otherwise, * `g::T`: descent direction -* $(_field_inv_retr) +$(_var(:Field, :inverse_retraction_method)) * `k_max::R`: upper bound on the sectional curvature of the manifold * `linearization_errors<:AbstractVector{<:R}`: linearization errors at the last serious step * `m::R`: the parameter to test the decrease of the cost: ``f(q_{k+1}) ≤ f(p_k) + m ξ``. -* $(_field_iterate) +$(_var(:Field, :p; add=[:as_Iterate])) * `p_last_serious::P`: last serious iterate -* $(_field_retr) -* $(_field_stop) +$(_var(:Field, :retraction_method)) +$(_var(:Field, :stopping_criterion, "stop")) * `transported_subgradients`: subgradients of the bundle that are transported to `p_last_serious` -* $(_field_vector_transp) -* $(_field_subgradient) -* $(_field_step) +$(_var(:Field, :vector_transport_method)) +$(_var(:Field, :X; add=[:as_Subgradient])) +$(_var(:Field, :stepsize)) * `ε::R`: convex combination of the linearization errors * `λ:::AbstractVector{<:R}`: convex coefficients from the slution of the subproblem * `ξ`: the stopping parameter given by ``ξ = -\\lVert g\\rvert^2 – ε`` -* $(_field_sub_problem) -* $(_field_sub_state) +$(_var(:Field, :sub_problem)) +$(_var(:Field, :sub_state)) # Constructor - ConvexBundleMethodState(M::AbstractManifold, p=rand(M); kwargs...) + ConvexBundleMethodState(M::AbstractManifold, sub_problem, sub_state; kwargs...) + ConvexBundleMethodState(M::AbstractManifold, sub_problem=convex_bundle_method_subsolver; evaluation=AllocatingEvaluation(), kwargs...) Generate the state for the [`convex_bundle_method`](@ref) on the manifold `M` -with initial point `p`. + +## Input + +$(_var(:Argument, :M; type=true)) +$(_var(:Argument, :sub_problem)) +$(_var(:Argument, :sub_state)) # Keyword arguments @@ -51,15 +57,13 @@ Most of the following keyword arguments set default values for the fields mentio * `diameter=50.0` * `domain=(M, p) -> isfinite(f(M, p))` * `k_max=0` -* `stepsize=default_stepsize(M, ConvexBundleMethodState)`, which defaults to [`ConstantStepsize`](@ref)`(M)`. -* $(_kw_inverse_retraction_method_default): $(_kw_inverse_retraction_method) -* $(_kw_retraction_method_default): $(_kw_retraction_method) -* `stopping_criterion=`[`StopWhenLagrangeMultiplierLess`](@ref)`(1e-8)`$(_sc_any)[`StopAfterIteration`](@ref)`(5000)` -* `X=`$(_link_zero_vector()) specify the type of tangent vector to use. -* $(_kw_vector_transport_method_default): $(_kw_vector_transport_method) -* `sub_problem=`[`convex_bundle_method_subsolver`](@ref) -* `sub_state=[`AllocatingEvaluation`](@ref) - +$(_var(:Keyword, :p; add=:as_Initial)) +$(_var(:Keyword, :stepsize; default="[`default_stepsize`](@ref)`(M, ConvexBundleMethodState)`")) +$(_var(:Keyword, :inverse_retraction_method)) +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :stopping_criterion; default="[`StopWhenLagrangeMultiplierLess`](@ref)`(1e-8)`$(_sc(:Any))[`StopAfterIteration`](@ref)`(5000)`")) +* `X=`$(_link(:zero_vector)) specify the type of tangent vector to use. +$(_var(:Keyword, :vector_transport_method)) """ mutable struct ConvexBundleMethodState{ P, @@ -106,7 +110,9 @@ mutable struct ConvexBundleMethodState{ ϱ::Nothing# deprecated function ConvexBundleMethodState( M::TM, - p::P=rand(M); + sub_problem::Pr, + sub_state::St; + p::P=rand(M), atol_λ::R=eps(), atol_errors::R=eps(), bundle_cap::I=25, @@ -121,17 +127,13 @@ mutable struct ConvexBundleMethodState{ StopAfterIteration(5000), X::T=zero_vector(M, p), vector_transport_method::VT=default_vector_transport_method(M, typeof(p)), - sub_problem::Pr=convex_bundle_method_subsolver, - sub_state::Union{AbstractEvaluationType,AbstractManoptSolverState}=AllocatingEvaluation(), - k_size=nothing,# deprecated - p_estimate=nothing,# deprecated - ϱ=nothing,# deprecated ) where { D, IR<:AbstractInverseRetractionMethod, P, T, - Pr, + Pr<:Union{AbstractManoptProblem,F} where {F}, + St<:AbstractManoptSolverState, I, TM<:AbstractManifold, TR<:AbstractRetractionMethod, @@ -140,7 +142,6 @@ mutable struct ConvexBundleMethodState{ VT<:AbstractVectorTransportMethod, R<:Real, } - sub_state_storage = maybe_wrap_evaluation_type(sub_state) bundle = Vector{Tuple{P,T}}() g = zero_vector(M, p) last_stepsize = one(R) @@ -149,13 +150,11 @@ mutable struct ConvexBundleMethodState{ ε = zero(R) λ = Vector{R}() ξ = zero(R) - !all(isnothing.([k_size, p_estimate, ϱ])) && - @error "Keyword arguments `k_size`, `p_estimate`, and the field `ϱ` are not used anymore. Use the field `k_max` instead." return new{ P, T, Pr, - typeof(sub_state_storage), + St, R, typeof(linearization_errors), typeof(bundle), @@ -192,11 +191,20 @@ mutable struct ConvexBundleMethodState{ ξ, λ, sub_problem, - sub_state_storage, - ϱ,# deprecated + sub_state, ) end end +function ConvexBundleMethodState( + M::AbstractManifold, + sub_problem=convex_bundle_method_subsolver; + evaluation::E=AllocatingEvaluation(), + kwargs..., +) where {E<:AbstractEvaluationType} + cfs = ClosedFormSubSolverState(; evaluation=evaluation) + return ConvexBundleMethodState(M, sub_problem, cfs; kwargs...) +end + get_iterate(bms::ConvexBundleMethodState) = bms.p_last_serious function set_iterate!(bms::ConvexBundleMethodState, M, p) copyto!(M, bms.p_last_serious, p) @@ -272,7 +280,7 @@ _doc_convex_bundle_method = """ convex_bundle_method(M, f, ∂f, p) convex_bundle_method!(M, f, ∂f, p) -perform a convex bundle method ``p^{(k+1)} = $(_l_retr)_{p^{(k)}}(-g_k)`` where +perform a convex bundle method ``p^{(k+1)} = $(_tex(:retr))_{p^{(k)}}(-g_k)`` where $(_doc_cbm_gk) @@ -286,10 +294,10 @@ For more details, see [BergmannHerzogJasa:2024](@cite). # Input -* $(_arg_M) -* $(_arg_f) -* $(_arg_subgrad_f) -* $(_arg_p) +$(_var(:Argument, :M; type=true)) +$(_var(:Argument, :f)) +$(_var(:Argument, :subgrad_f, _var(:subgrad_f, :symbol))) +$(_var(:Argument, :p)) # Keyword arguments @@ -299,21 +307,19 @@ For more details, see [BergmannHerzogJasa:2024](@cite). * `m=1e-3`: : the parameter to test the decrease of the cost: ``f(q_{k+1}) ≤ f(p_k) + m ξ``. * `diameter=50.0`: estimate for the diameter of the level set of the objective function at the starting point. * `domain=(M, p) -> isfinite(f(M, p))`: a function to that evaluates to true when the current candidate is in the domain of the objective `f`, and false otherwise. -* $(_kw_evaluation_default): $(_kw_evaluation) +$(_var(:Keyword, :evaluation)) * `k_max=0`: upper bound on the sectional curvature of the manifold. -* `stepsize=default_stepsize(M, ConvexBundleMethodState)`, which defaults to [`ConstantStepsize`](@ref)`(M)`. -* $(_kw_inverse_retraction_method_default): $(_kw_inverse_retraction_method) -* $(_kw_retraction_method_default): $(_kw_retraction_method) -* `stopping_criterion=`[`StopWhenLagrangeMultiplierLess`](@ref)`(1e-8)`$(_sc_any)[`StopAfterIteration`](@ref)`(5000)`: - $(_kw_stopping_criterion) -* `X=`$(_link_zero_vector()) specify the type of tangent vector to use. -* $(_kw_vector_transport_method_default): $(_kw_vector_transport_method) -* `sub_problem=`[`convex_bundle_method_subsolver`](@ref): a Manopt problem or a closed form solution as a function for the sub problem -* `sub_state=[`AllocatingEvaluation`](@ref): specify a solver for the sub problem or how the closed form solution function is evaluated. - -$(_kw_others) - -$(_doc_sec_output) +$(_var(:Keyword, :stepsize; default="[`default_stepsize`](@ref)`(M, ConvexBundleMethodState)`")) +$(_var(:Keyword, :inverse_retraction_method))$(_var(:Keyword, :inverse_retraction_method)) +$(_var(:Keyword, :stopping_criterion; default="[`StopWhenLagrangeMultiplierLess`](@ref)`(1e-8)`$(_sc(:Any))[`StopAfterIteration`](@ref)`(5000)`")) +$(_var(:Keyword, :vector_transport_method)) +$(_var(:Keyword, :sub_state; default="[`convex_bundle_method_subsolver`](@ref)`")) +$(_var(:Keyword, :sub_problem; default="[`AllocatingEvaluation`](@ref)")) +$(_var(:Keyword, :X)) + +$(_note(:OtherKeywords)) + +$(_note(:OutputSection)) """ @doc "$(_doc_convex_bundle_method)" @@ -337,7 +343,7 @@ function convex_bundle_method!( domain=(M, p) -> isfinite(f(M, p)), m::R=1e-3, k_max=0, - stepsize::Stepsize=DomainBackTrackingStepsize(0.5), + stepsize::Union{Stepsize,ManifoldDefaultsFactory}=DomainBackTrackingStepsize(0.5), debug=[DebugWarnIfLagrangeMultiplierIncreases()], evaluation::AbstractEvaluationType=AllocatingEvaluation(), inverse_retraction_method::IR=default_inverse_retraction_method(M, typeof(p)), @@ -348,10 +354,7 @@ function convex_bundle_method!( vector_transport_method::VTransp=default_vector_transport_method(M, typeof(p)), sub_problem=convex_bundle_method_subsolver, sub_state::Union{AbstractEvaluationType,AbstractManoptSolverState}=evaluation, - k_size=nothing,# deprecated - p_estimate=nothing,# deprecated - ϱ=nothing,# deprecated - kwargs..., #especially may contain debug + kwargs..., ) where {R<:Real,TF,TdF,TRetr,IR,VTransp} sgo = ManifoldSubgradientObjective(f, ∂f!!; evaluation=evaluation) dsgo = decorate_objective!(M, sgo; kwargs...) @@ -359,7 +362,9 @@ function convex_bundle_method!( sub_state_storage = maybe_wrap_evaluation_type(sub_state) bms = ConvexBundleMethodState( M, - p; + sub_problem, + maybe_wrap_evaluation_type(sub_state); + p=p, atol_λ=atol_λ, atol_errors=atol_errors, bundle_cap=bundle_cap, @@ -367,16 +372,11 @@ function convex_bundle_method!( domain=domain, m=m, k_max=k_max, - stepsize=stepsize, + stepsize=_produce_type(stepsize, M), inverse_retraction_method=inverse_retraction_method, retraction_method=retraction_method, stopping_criterion=stopping_criterion, vector_transport_method=vector_transport_method, - sub_problem=sub_problem, - sub_state=sub_state_storage, - k_size=k_size,# deprecated - p_estimate=p_estimate,# deprecated - ϱ=ϱ,# deprecated ) bms = decorate_state!(bms; debug=debug, kwargs...) return get_solver_return(solve!(mp, bms)) diff --git a/src/solvers/cyclic_proximal_point.jl b/src/solvers/cyclic_proximal_point.jl index ce8e2740d6..4882af23dc 100644 --- a/src/solvers/cyclic_proximal_point.jl +++ b/src/solvers/cyclic_proximal_point.jl @@ -18,31 +18,30 @@ end _doc_CPPA = """ cyclic_proximal_point(M, f, proxes_f, p; kwargs...) cyclic_proximal_point(M, mpo, p; kwargs...) - cyclic_proximal_point!(M, f, proxes_f, p; kwargs...) - cyclic_proximal_point!(M, mpo, p; kwargs...) + cyclic_proximal_point!(M, f, proxes_f; kwargs...) + cyclic_proximal_point!(M, mpo; kwargs...) perform a cyclic proximal point algorithm. This can be done in-place of `p`. # Input -* $(_arg_M) -* `f`: a cost function ``f: $(_l_M) M→ℝ`` to minimize +$(_var(:Argument, :M; type=true)) +* `f`: a cost function ``f: $(_math(:M)) M→ℝ`` to minimize * `proxes_f`: an Array of proximal maps (`Function`s) `(M,λ,p) -> q` or `(M, q, λ, p) -> q` for the summands of ``f`` (see `evaluation`) -* $(_arg_p) where `f` and the proximal maps `proxes_f` can also be given directly as a [`ManifoldProximalMapObjective`](@ref) `mpo` # Keyword arguments -* $(_kw_evaluation_default): $(_kw_evaluation) +$(_var(:Keyword, :evaluation)) * `evaluation_order=:Linear`: whether to use a randomly permuted sequence (`:FixedRandom`:, a per cycle permuted sequence (`:Random`) or the default linear one. * `λ=iter -> 1/iter`: a function returning the (square summable but not summable) sequence of ``λ_i`` -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(5000)`$(_sc_any)[`StopWhenChangeLess`](@ref)`(1e-12)`): $(_kw_stopping_criterion) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(5000)`$(_sc(:Any))[`StopWhenChangeLess`](@ref)`(1e-12)`)")) -$(_kw_others) +$(_note(:OtherKeywords)) -$(_doc_sec_output) +$(_note(:OutputSection)) """ @doc "$(_doc_CPPA)" @@ -55,27 +54,12 @@ function cyclic_proximal_point( evaluation::AbstractEvaluationType=AllocatingEvaluation(), kwargs..., ) - mpo = ManifoldProximalMapObjective(f, proxes_f; evaluation=evaluation) - return cyclic_proximal_point(M, mpo, p; evaluation=evaluation, kwargs...) -end -function cyclic_proximal_point( - M::AbstractManifold, - f, - proxes_f::Union{Tuple,AbstractVector}, - p::Number; - evaluation::AbstractEvaluationType=AllocatingEvaluation(), - kwargs..., -) - q = [p] - f_(M, p) = f(M, p[]) - if evaluation isa AllocatingEvaluation - proxes_f_ = [(M, λ, p) -> [pf(M, λ, p[])] for pf in proxes_f] - else - proxes_f_ = [(M, q, λ, p) -> (q .= [pf(M, λ, p[])]) for pf in proxes_f] - end - rs = cyclic_proximal_point(M, f_, proxes_f_, q; evaluation=evaluation, kwargs...) - #return just a number if the return type is the same as the type of q - return (typeof(q) == typeof(rs)) ? rs[] : rs + p_ = _ensure_mutating_variable(p) + f_ = _ensure_mutating_cost(f, p) + proxes_f_ = [_ensure_mutating_prox(prox_f, p, evaluation) for prox_f in proxes_f] + mpo = ManifoldProximalMapObjective(f_, proxes_f_; evaluation=evaluation) + rs = cyclic_proximal_point(M, mpo, p_; evaluation=evaluation, kwargs...) + return _ensure_matching_output(p, rs) end function cyclic_proximal_point( M::AbstractManifold, mpo::O, p; kwargs... @@ -103,14 +87,18 @@ function cyclic_proximal_point!( p; evaluation_order::Symbol=:Linear, stopping_criterion::StoppingCriterion=StopAfterIteration(5000) | - StopWhenChangeLess(1e-12), + StopWhenChangeLess(M, 1e-12), λ=i -> 1 / i, kwargs..., ) where {O<:Union{ManifoldProximalMapObjective,AbstractDecoratedManifoldObjective}} dmpo = decorate_objective!(M, mpo; kwargs...) dmp = DefaultManoptProblem(M, dmpo) cpps = CyclicProximalPointState( - M, p; stopping_criterion=stopping_criterion, λ=λ, evaluation_order=evaluation_order + M; + p=p, + stopping_criterion=stopping_criterion, + λ=λ, + evaluation_order=evaluation_order, ) dcpps = decorate_state!(cpps; kwargs...) solve!(dmp, dcpps) diff --git a/src/solvers/difference-of-convex-proximal-point.jl b/src/solvers/difference-of-convex-proximal-point.jl index be5893e867..1bb4daa3c0 100644 --- a/src/solvers/difference-of-convex-proximal-point.jl +++ b/src/solvers/difference-of-convex-proximal-point.jl @@ -8,28 +8,45 @@ It comes in two forms, depending on the realisation of the `subproblem`. # Fields -* $(_field_inv_retr) -* $(_field_retr) -* `p`, `q`, `r`: the current iterate, the gradient step and the prox, respectively - their type is set by initializing `p` -* $(_field_step) -* $(_field_stop) +$(_var(:Field, :inverse_retraction_method)) +$(_var(:Field, :retraction_method)) +$(_var(:Field, :p; add=[:as_Iterate])) +$(_var(:Field, :p, "q"; add=" storing the gradient step")) +$(_var(:Field, :p, "r"; add=" storing the result of the proximal map")) +$(_var(:Field, :stepsize)) +$(_var(:Field, :stopping_criterion, "stop")) * `X`, `Y`: the current gradient and descent direction, respectively their common type is set by the keyword `X` -* `sub_problem`: an [`AbstractManoptProblem`](@ref) problem or a function `(M, p, X) -> q` or `(M, q, p, X)` for the a closed form solution of the sub problem -* `sub_state`: an [`AbstractManoptSolverState`](@ref) for the subsolver or an [`AbstractEvaluationType`](@ref) in case the sub problem is provided as a function +$(_var(:Field, :sub_problem)) +$(_var(:Field, :sub_state)) # Constructor - DifferenceOfConvexProximalState(M, p; kwargs...) + DifferenceOfConvexProximalState(M::AbstractManifold, sub_problem, sub_state; kwargs...) + +construct an difference of convex proximal point state + + DifferenceOfConvexProximalState(M::AbstractManifold, sub_problem; + evaluation=AllocatingEvaluation(), kwargs... +) + +construct an difference of convex proximal point state, where `sub_problem` is a closed form solution with `evaluation` as type of evaluation. + +## Input + +$(_var(:Argument, :M; type=true)) +$(_var(:Argument, :sub_problem)) +$(_var(:Argument, :sub_state)) # Keyword arguments -* $(_kw_inverse_retraction_method_default): $(_kw_inverse_retraction_method) -* $(_kw_retraction_method_default): $(_kw_retraction_method) -* `stepsize=`[`ConstantStepsize`](@ref)`(M)`: $(_kw_stepsize) -* `stopping_criterion=`[StopWhenChangeLess`](@ref)`(1e-8)`: $(_kw_stopping_criterion) -* $(_kw_X_default): $(_kw_X) +$(_var(:Keyword, :inverse_retraction_method)) +$(_var(:Keyword, :p; add=:as_Initial)) +$(_var(:Keyword, :retraction_method)) + +$(_var(:Keyword, :stepsize; default="[`ConstantLength`](@ref)`()`")) +$(_var(:Keyword, :stopping_criterion; default="[StopWhenChangeLess`](@ref)`(1e-8)`")) +$(_var(:Keyword, :X; add=:as_Memory)) """ mutable struct DifferenceOfConvexProximalState{ P, @@ -55,33 +72,33 @@ mutable struct DifferenceOfConvexProximalState{ stop::SC function DifferenceOfConvexProximalState( M::AbstractManifold, - p::P, sub_problem::Pr, - sub_state::Union{AbstractEvaluationType,AbstractManoptSolverState}; + sub_state::St; + p::P=rand(M), X::T=zero_vector(M, p), stepsize::S=ConstantStepsize(M), - stopping_criterion::SC=StopWhenChangeLess(1e-8), + stopping_criterion::SC=StopWhenChangeLess(M, 1e-8), inverse_retraction_method::I=default_inverse_retraction_method(M), retraction_method::R=default_retraction_method(M, typeof(p)), λ::Fλ=i -> 1, ) where { P, T, - Pr, + Pr<:Union{AbstractManoptProblem,F} where {F}, S<:Stepsize, + St<:AbstractManoptSolverState, SC<:StoppingCriterion, I<:AbstractInverseRetractionMethod, R<:AbstractRetractionMethod, Fλ, } - sub_state_storage = maybe_wrap_evaluation_type(sub_state) - return new{P,T,Pr,typeof(sub_state_storage),S,SC,R,I,Fλ}( + return new{P,T,Pr,St,S,SC,R,I,Fλ}( λ, p, copy(M, p), copy(M, p), sub_problem, - sub_state_storage, + sub_state, X, retraction_method, inverse_retraction_method, @@ -90,11 +107,11 @@ mutable struct DifferenceOfConvexProximalState{ ) end end -# no point -> add point function DifferenceOfConvexProximalState( - M::AbstractManifold, sub_problem, sub_state; kwargs... -) - return DifferenceOfConvexProximalState(M, rand(M), sub_problem, sub_state; kwargs...) + M::AbstractManifold, sub_problem; evaluation::E=AllocatingEvaluation(), kwargs... +) where {E<:AbstractEvaluationType} + cfs = ClosedFormSubSolverState(; evaluation=evaluation) + return DifferenceOfConvexProximalState(M, sub_problem, cfs; kwargs...) end get_iterate(dcps::DifferenceOfConvexProximalState) = dcps.p function set_iterate!(dcps::DifferenceOfConvexProximalState, M, p) @@ -146,23 +163,23 @@ _doc_DCPPA = """ Compute the difference of convex proximal point algorithm [SouzaOliveira:2015](@cite) to minimize ```math - $(_l_argmin)_{p∈$(_l_M)} g(p) - h(p) + $(_tex(:argmin))_{p∈$(_math(:M))} g(p) - h(p) ``` where you have to provide the subgradient ``∂h`` of ``h`` and either -* the proximal map ``$(_l_prox)_{λg}`` of `g` as a function `prox_g(M, λ, p)` or `prox_g(M, q, λ, p)` +* the proximal map ``$(_tex(:prox))_{λg}`` of `g` as a function `prox_g(M, λ, p)` or `prox_g(M, q, λ, p)` * the functions `g` and `grad_g` to compute the proximal map using a sub solver * your own sub-solver, specified by `sub_problem=`and `sub_state=` This algorithm performs the following steps given a start point `p`= ``p^{(0)}``. Then repeat for ``k=0,1,…`` -1. ``X^{(k)} ∈ $(_l_grad) h(p^{(k)})`` -2. ``q^{(k)} = $(_l_retr)_{p^{(k)}}(λ_kX^{(k)})`` -3. ``r^{(k)} = $(_l_prox)_{λ_kg}(q^{(k)})`` -4. ``X^{(k)} = $(_l_retr)^{-1}_{p^{(k)}}(r^{(k)})`` +1. ``X^{(k)} ∈ $(_tex(:grad)) h(p^{(k)})`` +2. ``q^{(k)} = $(_tex(:retr))_{p^{(k)}}(λ_kX^{(k)})`` +3. ``r^{(k)} = $(_tex(:prox))_{λ_kg}(q^{(k)})`` +4. ``X^{(k)} = $(_tex(:invretr))_{p^{(k)}}(r^{(k)})`` 5. Compute a stepsize ``s_k`` and -6. set ``p^{(k+1)} = $(_l_retr)_{p^{(k)}}(s_kX^{(k)})``. +6. set ``p^{(k+1)} = $(_tex(:retr))_{p^{(k)}}(s_kX^{(k)})``. until the `stopping_criterion` is fulfilled. @@ -175,102 +192,77 @@ DC functions is obtained for ``s_k = 1`` and one can hence employ usual line sea * `λ`: ( `k -> 1/2` ) a function returning the sequence of prox parameters ``λ_k`` * `cost=nothing`: provide the cost `f`, for debug reasons / analysis -* $(_kw_evaluation_default): $(_kw_evaluation) -* `gradient=nothing`: specify ``$(_l_grad) f``, for debug / analysis +$(_var(:Keyword, :evaluation)) +* `gradient=nothing`: specify ``$(_tex(:grad)) f``, for debug / analysis or enhancing the `stopping_criterion` * `prox_g=nothing`: specify a proximal map for the sub problem _or_ both of the following * `g=nothing`: specify the function `g`. * `grad_g=nothing`: specify the gradient of `g`. If both `g`and `grad_g` are specified, a subsolver is automatically set up. -* $(_kw_inverse_retraction_method_default); $(_kw_inverse_retraction_method) -* $(_kw_retraction_method_default); $(_kw_retraction_method) -* `stepsize=`[`ConstantStepsize`](@ref)`(M)`): $(_kw_stepsize) -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(200)`$(_sc_any)[`StopWhenChangeLess`](@ref)`(1e-8)`): - $(_kw_stopping_criterion) - A [`StopWhenGradientNormLess`](@ref)`(1e-8)` is added with $(_sc_any), when a `gradient` is provided. +$(_var(:Keyword, :inverse_retraction_method)) +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :stepsize; default="[`ConstantLength`](@ref)`()`")) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(200)`$(_sc(:Any))[`StopWhenChangeLess`](@ref)`(1e-8)`)")) + A [`StopWhenGradientNormLess`](@ref)`(1e-8)` is added with $(_sc(:Any)), when a `gradient` is provided. * `sub_cost=`[`ProximalDCCost`](@ref)`(g, copy(M, p), λ(1))`): cost to be used within the default `sub_problem` that is initialized as soon as `g` is provided. - $(_kw_used_in("sub_objective")) + $(_note(:KeywordUsedIn, "sub_objective")) * `sub_grad=`[`ProximalDCGrad`](@ref)`(grad_g, copy(M, p), λ(1); evaluation=evaluation)`: gradient to be used within the default `sub_problem`, that is initialized as soon as `grad_g` is provided. - $(_kw_used_in("sub_objective")) + $(_note(:KeywordUsedIn, "sub_objective")) * `sub_hess`: (a finite difference approximation using `sub_grad` by default): specify a Hessian of the `sub_cost`, which the default solver, see `sub_state=` needs. -* $(_kw_sub_kwargs_default): $(_kw_sub_kwargs) +$(_var(:Keyword, :sub_kwargs)) * `sub_objective`: a gradient or Hessian objective based on `sub_cost=`, `sub_grad=`, and `sub_hess`if provided the objective used within `sub_problem`. - $(_kw_used_in("sub_problem")) -* `sub_problem=`[`DefaultManoptProblem`](@ref)`(M, sub_objective)`: - specify a manopt problem or a function for the sub-solver runs. - You can also provide a function for a closed form solution. Then `evaluation=` is taken into account for the form of this function. -* `sub_state`([`GradientDescentState`](@ref) or [`TrustRegionsState`](@ref) if `sub_hessian`): - the subsolver to be used when solving the sub problem. - By default this is also decorated using the `sub_kwargs`. - if the `sub_problem` if a function (a closed form solution), this is set to `evaluation` - and can be changed to the evaluation type of the closed form solution accordingly. -* `sub_stopping_criterion`: ([`StopAfterIteration`](@ref)`(300)`$(_sc_any)`[`StopWhenGradientNormLess`](@ref)`(1e-8)`: - a stopping criterion used withing the default `sub_state=` - $(_kw_used_in("sub_state")) - -$(_kw_others) - -$(_doc_sec_output) + $(_note(:KeywordUsedIn, "sub_problem")) +$(_var(:Keyword, :sub_problem; default="[`DefaultManoptProblem`](@ref)`(M, sub_objective)`")) +$(_var(:Keyword, :sub_state; default="([`GradientDescentState`](@ref) or [`TrustRegionsState`](@ref) if `sub_hessian` is provided)")) +$(_var(:Keyword, :stopping_criterion, "sub_stopping_criterion"; default="([`StopAfterIteration`](@ref)`(300)`$(_sc(:Any))`[`StopWhenGradientNormLess`](@ref)`(1e-8)`")) + $(_note(:KeywordUsedIn, "sub_state")) + +$(_note(:OtherKeywords)) + +$(_note(:OutputSection)) """ @doc "$(_doc_DCPPA)" difference_of_convex_proximal_point(M::AbstractManifold, args...; kwargs...) -function difference_of_convex_proximal_point(M::AbstractManifold, grad_h; kwargs...) - return difference_of_convex_proximal_point( - M::AbstractManifold, grad_h, rand(M); kwargs... - ) -end function difference_of_convex_proximal_point( M::AbstractManifold, grad_h, - p; - evaluation::AbstractEvaluationType=AllocatingEvaluation(), + p=rand(M); cost=nothing, - gradient=nothing, - kwargs..., -) - mdcpo = ManifoldDifferenceOfConvexProximalObjective( - grad_h; cost=cost, gradient=gradient, evaluation=evaluation - ) - return difference_of_convex_proximal_point( - M, mdcpo, p; evaluation=evaluation, kwargs... - ) -end -function difference_of_convex_proximal_point( - M::AbstractManifold, - grad_h, - p::Number; evaluation::AbstractEvaluationType=AllocatingEvaluation(), - cost=nothing, gradient=nothing, g=nothing, grad_g=nothing, prox_g=nothing, kwargs..., ) - q = [p] - cost_ = isnothing(cost) ? nothing : (M, p) -> cost(M, p[]) - grad_h_ = _to_mutating_gradient(grad_h, evaluation) - g_ = isnothing(g) ? nothing : (M, p) -> g(M, p[]) - gradient_ = isnothing(gradient) ? nothing : _to_mutating_gradient(gradient, evaluation) - grad_g_ = isnothing(grad_g) ? nothing : _to_mutating_gradient(grad_g, evaluation) - prox_g_ = isnothing(prox_g) ? nothing : _to_mutating_gradient(prox_g, evaluation) + p_ = _ensure_mutating_variable(p) + cost_ = _ensure_mutating_cost(cost, p) + grad_h_ = _ensure_mutating_gradient(grad_h, p, evaluation) + g_ = _ensure_mutating_cost(g, p) + gradient_ = _ensure_mutating_gradient(gradient, p, evaluation) + grad_g_ = _ensure_mutating_gradient(grad_g, p, evaluation) + prox_g_ = _ensure_mutating_prox(prox_g, p, evaluation) + mdcpo = ManifoldDifferenceOfConvexProximalObjective( + grad_h_; cost=cost_, gradient=gradient_, evaluation=evaluation + ) rs = difference_of_convex_proximal_point( M, - grad_h_, - q; + mdcpo, + p_; cost=cost_, evaluation=evaluation, gradient=gradient_, g=g_, grad_g=grad_g_, prox_g=prox_g_, + kwargs..., ) - return (typeof(q) == typeof(rs)) ? rs[] : rs + return _ensure_matching_output(p, rs) end function difference_of_convex_proximal_point( @@ -313,11 +305,13 @@ function difference_of_convex_proximal_point!( inverse_retraction_method=default_inverse_retraction_method(M), objective_type=:Riemannian, retraction_method=default_retraction_method(M, typeof(p)), - stepsize=ConstantStepsize(M), + stepsize::Union{Stepsize,ManifoldDefaultsFactory}=ConstantLength(M), stopping_criterion=if isnothing(get_gradient_function(mdcpo)) - StopAfterIteration(300) | StopWhenChangeLess(1e-9) + StopAfterIteration(300) | StopWhenChangeLess(M, 1e-9) else - StopAfterIteration(300) | StopWhenChangeLess(1e-9) | StopWhenGradientNormLess(1e-9) + StopAfterIteration(300) | + StopWhenChangeLess(M, 1e-9) | + StopWhenGradientNormLess(1e-9) end, sub_cost=isnothing(g) ? nothing : ProximalDCCost(g, copy(M, p), λ(1)), sub_grad=if isnothing(grad_g) @@ -363,17 +357,20 @@ function difference_of_convex_proximal_point!( decorate_state!( if isnothing(sub_hess) GradientDescentState( - M, copy(M, p); stopping_criterion=sub_stopping_criterion, sub_kwargs... + M; + p=copy(M, p), + stopping_criterion=sub_stopping_criterion, + sub_kwargs..., ) else TrustRegionsState( M, - copy(M, p), DefaultManoptProblem( TangentSpace(M, copy(M, p)), TrustRegionModelObjective(sub_objective), ), - TruncatedConjugateGradientState(TangentSpace(M, p); sub_kwargs...), + TruncatedConjugateGradientState(TangentSpace(M, p); sub_kwargs...); + p=copy(M, p), ) end; sub_kwargs..., @@ -400,11 +397,11 @@ function difference_of_convex_proximal_point!( dmp = DefaultManoptProblem(M, dmdcpo) dcps = DifferenceOfConvexProximalState( M, - p, sub_problem, maybe_wrap_evaluation_type(sub_state); + p=p, X=X, - stepsize=stepsize, + stepsize=_produce_type(stepsize, M), stopping_criterion=stopping_criterion, inverse_retraction_method=inverse_retraction_method, retraction_method=retraction_method, @@ -475,10 +472,10 @@ function step_solver!( # do a step in that direction retract!(M, dcps.q, dcps.p, dcps.λ(k) * dcps.X, dcps.retraction_method) # use this point (q) for the proximal map - set_manopt_parameter!(dcps.sub_problem, :Objective, :Cost, :p, dcps.q) - set_manopt_parameter!(dcps.sub_problem, :Objective, :Cost, :λ, dcps.λ(k)) - set_manopt_parameter!(dcps.sub_problem, :Objective, :Gradient, :p, dcps.q) - set_manopt_parameter!(dcps.sub_problem, :Objective, :Gradient, :λ, dcps.λ(k)) + set_parameter!(dcps.sub_problem, :Objective, :Cost, :p, dcps.q) + set_parameter!(dcps.sub_problem, :Objective, :Cost, :λ, dcps.λ(k)) + set_parameter!(dcps.sub_problem, :Objective, :Gradient, :p, dcps.q) + set_parameter!(dcps.sub_problem, :Objective, :Gradient, :λ, dcps.λ(k)) set_iterate!(dcps.sub_state, M, copy(M, dcps.q)) solve!(dcps.sub_problem, dcps.sub_state) copyto!(M, dcps.r, get_solver_result(dcps.sub_state)) @@ -492,14 +489,3 @@ function step_solver!( end return dcps end -# -# Deprecated old variants with `prox_g` as a parameter -# -@deprecate difference_of_convex_proximal_point( - M::AbstractManifold, prox_g, grad_h, p; kwargs... -) difference_of_convex_proximal_point( - M::AbstractManifold, grad_h, p; prox_g=prox_g, kwargs... -) -@deprecate difference_of_convex_proximal_point!(M, grad_h, prox_g, p; kwargs...) difference_of_convex_proximal_point!( - M, grad_h, p; prox_g=prox_g, kwargs... -) diff --git a/src/solvers/difference_of_convex_algorithm.jl b/src/solvers/difference_of_convex_algorithm.jl index b56bc1ef8d..c8a72c79ec 100644 --- a/src/solvers/difference_of_convex_algorithm.jl +++ b/src/solvers/difference_of_convex_algorithm.jl @@ -8,16 +8,16 @@ It comes in two forms, depending on the realisation of the `subproblem`. # Fields -* $(_field_iterate) -* $(_field_subgradient) -* $(_field_sub_problem) -* $(_field_sub_state) -* $(_field_stop) +$(_var(:Field, :p; add=[:as_Iterate])) +$(_var(:Field, :X; add=[:as_Subgradient])) +$(_var(:Field, :sub_problem)) +$(_var(:Field, :sub_state)) +$(_var(:Field, :stopping_criterion, "stop")) The sub task consists of a method to solve ```math - $(_l_argmin)_{q∈$(_l_M)}\\ g(p) - ⟨X, $(_l_log)_p q⟩ + $(_tex(:argmin))_{q∈$(_math(:M))}\\ g(p) - ⟨X, $(_tex(:log))_p q⟩ ``` is needed. Besides a problem and a state, one can also provide a function and @@ -26,8 +26,8 @@ a closed form solution for the sub task. # Constructors - DifferenceOfConvexState(M, p, sub_problem, sub_state; kwargs...) - DifferenceOfConvexState(M, p, sub_solver; evaluation=InplaceEvaluation(), kwargs...) + DifferenceOfConvexState(M, sub_problem, sub_state; kwargs...) + DifferenceOfConvexState(M, sub_solver; evaluation=InplaceEvaluation(), kwargs...) Generate the state either using a solver from Manopt, given by an [`AbstractManoptProblem`](@ref) `sub_problem` and an [`AbstractManoptSolverState`](@ref) `sub_state`, @@ -37,11 +37,12 @@ Here the elements passed are the current iterate `p` and the subgradient `X` of ## further keyword arguments -* `initial_vector=`$(_link_zero_vector()): how to initialize the inner gradient tangent vector -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(200)`: a stopping criterion +$(_var(:Keyword, :p; add=:as_Initial)) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(200)`")) +$(_var(:Keyword, :X; add=:as_Memory)) """ mutable struct DifferenceOfConvexState{ - Pr,St<:AbstractManoptSolverState,P,T,SC<:StoppingCriterion + P,T,Pr,St<:AbstractManoptSolverState,SC<:StoppingCriterion } <: AbstractSubProblemSolverState p::P X::T @@ -50,32 +51,28 @@ mutable struct DifferenceOfConvexState{ stop::SC function DifferenceOfConvexState( M::AbstractManifold, - p::P, sub_problem::Pr, - sub_state::Union{AbstractEvaluationType,AbstractManoptSolverState}; - initial_vector::T=zero_vector(M, p), - stopping_criterion::SC=StopAfterIteration(300) | StopWhenChangeLess(1e-9), - ) where {P,Pr<:AbstractManoptProblem,T,SC<:StoppingCriterion} - sub_state_storage = maybe_wrap_evaluation_type(sub_state) - return new{Pr,typeof(sub_state_storage),P,T,SC}( - p, initial_vector, sub_problem, sub_state, stopping_criterion - ) - end - # Function - function DifferenceOfConvexState( - M::AbstractManifold, - p::P, - sub_problem::S; - initial_vector::T=zero_vector(M, p), - stopping_criterion::SC=StopAfterIteration(300) | StopWhenChangeLess(1e-9), - evaluation::AbstractEvaluationType=AllocatingEvaluation(), - ) where {P,S<:Function,T,SC<:StoppingCriterion} - sub_state_storage = maybe_wrap_evaluation_type(evaluation) - return new{S,typeof(sub_state_storage),P,T,SC}( - p, initial_vector, sub_problem, sub_state_storage, stopping_criterion - ) + sub_state::St; + p::P=rand(M), + X::T=zero_vector(M, p), + stopping_criterion::SC=StopAfterIteration(300) | StopWhenChangeLess(M, 1e-9), + ) where { + P, + T, + Pr<:Union{AbstractManoptProblem,F} where {F}, + St<:AbstractManoptSolverState, + SC<:StoppingCriterion, + } + return new{P,T,Pr,St,SC}(p, X, sub_problem, sub_state, stopping_criterion) end end +function DifferenceOfConvexState( + M::AbstractManifold, sub_problem; evaluation::E=AllocatingEvaluation(), kwargs... +) where {E<:AbstractEvaluationType} + cfs = ClosedFormSubSolverState(; evaluation=evaluation) + return DifferenceOfConvexState(M, sub_problem, cfs; kwargs...) +end + get_iterate(dcs::DifferenceOfConvexState) = dcs.p function set_iterate!(dcs::DifferenceOfConvexState, M, p) copyto!(M, dcs.p, p) @@ -119,7 +116,7 @@ _doc_DoC = """ Compute the difference of convex algorithm [BergmannFerreiraSantosSouza:2023](@cite) to minimize ```math - $(_l_argmin)_{p∈$(_l_M)}\\ g(p) - h(p) + $(_tex(:argmin))_{p∈$(_math(:M))}\\ g(p) - h(p) ``` where you need to provide ``f(p) = g(p) - h(p)``, ``g`` and the subdifferential ``∂h`` of ``h``. @@ -130,102 +127,77 @@ Then repeat for ``k=0,1,…`` 1. Take ``X^{(k)} ∈ ∂h(p^{(k)})`` 2. Set the next iterate to the solution of the subproblem ```math - p^{(k+1)} ∈ $(_l_argmin)_{q ∈ $(_l_M)} g(q) - ⟨X^{(k)}, $(_l_log)_{p^{(k)}}q⟩ + p^{(k+1)} ∈ $(_tex(:argmin))_{q ∈ $(_math(:M))} g(q) - ⟨X^{(k)}, $(_tex(:log))_{p^{(k)}}q⟩ ``` until the stopping criterion (see the `stopping_criterion` keyword is fulfilled. # Keyword arguments -* $(_kw_evaluation_default): $(_kw_evaluation) -* `gradient=nothing`: specify ``$(_l_grad) f``, for debug / analysis or enhancing the `stopping_criterion=` +$(_var(:Keyword, :evaluation)) +* `gradient=nothing`: specify ``$(_tex(:grad)) f``, for debug / analysis or enhancing the `stopping_criterion=` * `grad_g=nothing`: specify the gradient of `g`. If specified, a subsolver is automatically set up. -* `initial_vector=`$(_link_zero_vector()): initialise the inner tangent vector to store the subgradient result. -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(200)`$(_sc_any)[`StopWhenChangeLess`](@ref)`(1e-8)`: - $(_kw_stopping_criterion) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(200)`$(_sc(:Any))[`StopWhenChangeLess`](@ref)`(1e-8)`")) * `g=nothing`: specify the function `g` If specified, a subsolver is automatically set up. * `sub_cost=`[`LinearizedDCCost`](@ref)`(g, p, initial_vector)`: a cost to be used within the default `sub_problem`. - $(_kw_used_in("sub_objective")) + $(_note(:KeywordUsedIn, "sub_objective")) * `sub_grad=`[`LinearizedDCGrad`](@ref)`(grad_g, p, initial_vector; evaluation=evaluation)`: gradient to be used within the default `sub_problem`. - $(_kw_used_in("sub_objective")) + $(_note(:KeywordUsedIn, "sub_objective")) * `sub_hess`: (a finite difference approximation using `sub_grad` by default): specify a Hessian of the `sub_cost`, which the default solver, see `sub_state=` needs. - $(_kw_used_in("sub_objective")) -* $(_kw_sub_kwargs_default): $(_kw_sub_kwargs) + $(_note(:KeywordUsedIn, "sub_objective")) +$(_var(:Keyword, :sub_kwargs)) * `sub_objective`: a gradient or Hessian objective based on `sub_cost=`, `sub_grad=`, and `sub_hess`if provided the objective used within `sub_problem`. - $(_kw_used_in("sub_problem")) -* `sub_problem=`[`DefaultManoptProblem`](@ref)`(M, sub_objective)`: - specify a manopt problem or a function for the sub-solver runs. - You can also provide a function for a closed form solution. Then `evaluation=` is taken into account for the form of this function. -* `sub_state`([`GradientDescentState`](@ref) or [`TrustRegionsState`](@ref) if `sub_hessian`): - the subsolver to be used when solving the sub problem. - By default this is also decorated using the `sub_kwargs`. - if the `sub_problem` if a function (a closed form solution), this is set to `evaluation` - and can be changed to the evaluation type of the closed form solution accordingly. -* `sub_stopping_criterion=`[`StopAfterIteration`](@ref)`(300)`$(_sc_any)[`StopWhenStepsizeLess`](@ref)`(1e-9)`$(_sc_any)[`StopWhenGradientNormLess`](@ref)`(1e-9)`: + $(_note(:KeywordUsedIn, "sub_problem")) +$(_var(:Keyword, :sub_state; default="([`GradientDescentState`](@ref) or [`TrustRegionsState`](@ref) if `sub_hessian` is provided)")) +$(_var(:Keyword, :sub_problem; default="[`DefaultManoptProblem`](@ref)`(M, sub_objective)`")) +* `sub_stopping_criterion=`[`StopAfterIteration`](@ref)`(300)`$(_sc(:Any))[`StopWhenStepsizeLess`](@ref)`(1e-9)`$(_sc(:Any))[`StopWhenGradientNormLess`](@ref)`(1e-9)`: a stopping criterion used withing the default `sub_state=` - $(_kw_used_in("sub_state")) + $(_note(:KeywordUsedIn, "sub_state")) * `sub_stepsize=`[`ArmijoLinesearch`](@ref)`(M)`) specify a step size used within the `sub_state`. - $(_kw_used_in("sub_state")) + $(_note(:KeywordUsedIn, "sub_state")) +$(_var(:Keyword, :X; add=:as_Memory)) -$(_kw_others) +$(_note(:OtherKeywords)) -$(_doc_sec_output) +$(_note(:OutputSection)) """ @doc "$(_doc_DoC)" difference_of_convex_algorithm(M::AbstractManifold, args...; kwargs...) -function difference_of_convex_algorithm(M::AbstractManifold, f, g, ∂h; kwargs...) - return difference_of_convex_algorithm(M::AbstractManifold, f, g, ∂h, rand(M); kwargs...) -end function difference_of_convex_algorithm( M::AbstractManifold, f, g, ∂h, - p; + p=rand(M); evaluation::AbstractEvaluationType=AllocatingEvaluation(), + grad_g=nothing, gradient=nothing, kwargs..., ) + p_ = _ensure_mutating_variable(p) + f_ = _ensure_mutating_cost(f, p) + g_ = _ensure_mutating_cost(g, p) + gradient_ = _ensure_mutating_gradient(gradient, p, evaluation) + grad_g_ = _ensure_mutating_gradient(grad_g, p, evaluation) + ∂h_ = _ensure_mutating_gradient(∂h, p, evaluation) mdco = ManifoldDifferenceOfConvexObjective( - f, ∂h; gradient=gradient, evaluation=evaluation - ) - return difference_of_convex_algorithm( - M, mdco, p; g=g, evaluation=evaluation, gradient=gradient, kwargs... + f_, ∂h_; gradient=gradient_, evaluation=evaluation ) -end -function difference_of_convex_algorithm( - M::AbstractManifold, - f, - g, - ∂h, - p::Number; - evaluation::AbstractEvaluationType=AllocatingEvaluation(), - grad_g=nothing, - gradient=nothing, - kwargs..., -) - q = [p] - f_(M, p) = f(M, p[]) - g_(M, p) = g(M, p[]) - gradient_ = isnothing(gradient) ? nothing : _to_mutating_gradient(gradient, evaluation) - grad_g_ = isnothing(grad_g) ? nothing : _to_mutating_gradient(grad_g, evaluation) - ∂h_ = isnothing(grad_g) ? nothing : _to_mutating_gradient(∂h, evaluation) rs = difference_of_convex_algorithm( M, - f_, - g_, - ∂h_, - q; + mdco, + p_; + g=g_, + evaluation=evaluation, gradient=gradient_, grad_g=grad_g_, - evaluation=evaluation, kwargs..., ) - return (typeof(q) == typeof(rs)) ? rs[] : rs + return _ensure_matching_output(p, rs) end function difference_of_convex_algorithm( M::AbstractManifold, mdco::O, p; kwargs... @@ -261,25 +233,25 @@ function difference_of_convex_algorithm!( g=nothing, grad_g=nothing, gradient=nothing, - initial_vector=zero_vector(M, p), + X=zero_vector(M, p), objective_type=:Riemannian, stopping_criterion=if isnothing(gradient) - StopAfterIteration(300) | StopWhenChangeLess(1e-9) + StopAfterIteration(300) | StopWhenChangeLess(M, 1e-9) else - StopAfterIteration(300) | StopWhenChangeLess(1e-9) | StopWhenGradientNormLess(1e-9) + StopAfterIteration(300) | + StopWhenChangeLess(M, 1e-9) | + StopWhenGradientNormLess(1e-9) end, # Subsolver Magic Cascade. sub_cost=if isnothing(g) nothing else - LinearizedDCCost(g, copy(M, p), copy(M, p, initial_vector)) + LinearizedDCCost(g, copy(M, p), copy(M, p, X)) end, sub_grad=if isnothing(grad_g) nothing else - LinearizedDCGrad( - grad_g, copy(M, p), copy(M, p, initial_vector); evaluation=evaluation - ) + LinearizedDCGrad(grad_g, copy(M, p), copy(M, p, X); evaluation=evaluation) end, sub_hess=ApproxHessianFiniteDifference(M, copy(M, p), sub_grad; evaluation=evaluation), sub_kwargs=(;), @@ -314,13 +286,16 @@ function difference_of_convex_algorithm!( decorate_state!( if isnothing(sub_hess) GradientDescentState( - M, copy(M, p); stopping_criterion=sub_stopping_criterion, sub_kwargs... + M; + p=copy(M, p), + stopping_criterion=sub_stopping_criterion, + sub_kwargs..., ) else TrustRegionsState( M, - copy(M, p), sub_objective; + p=copy(M, p), stopping_criterion=sub_stopping_criterion, sub_kwargs..., ) @@ -332,35 +307,23 @@ function difference_of_convex_algorithm!( ) where {O<:Union{ManifoldDifferenceOfConvexObjective,AbstractDecoratedManifoldObjective}} dmdco = decorate_objective!(M, mdco; objective_type=objective_type, kwargs...) dmp = DefaultManoptProblem(M, dmdco) - if isnothing(sub_problem) - error( - """ - Subproblem not correctly initialized. Please provide _either_ - * a `sub_problem=` to be solved - * a `sub_objective` to automatically generate the sub problem, - * `sub_grad=` (as well as the usually given `sub_cost=`) to automatically generate the sub objective _or_ - * `grad_g=` keywords to automatically generate the sub problems gradient. - """, - ) - elseif sub_problem isa AbstractManoptProblem - dcs = DifferenceOfConvexState( - M, - p, - sub_problem, - sub_state; - stopping_criterion=stopping_criterion, - initial_vector=initial_vector, - ) - else - dcs = DifferenceOfConvexState( - M, - p, - sub_problem; - evaluation=evaluation, - stopping_criterion=stopping_criterion, - initial_vector=initial_vector, - ) - end + isnothing(sub_problem) && error( + """ + Subproblem not correctly initialized. Please provide _either_ + * a `sub_problem=` to be solved + * a `sub_objective` to automatically generate the sub problem, + * `sub_grad=` (as well as the usually given `sub_cost=`) to automatically generate the sub objective _or_ + * `grad_g=` keywords to automatically generate the sub problems gradient. + """, + ) + dcs = DifferenceOfConvexState( + M, + sub_problem, + maybe_wrap_evaluation_type(sub_state); + p=p, + stopping_criterion=stopping_criterion, + X=X, + ) ddcs = decorate_state!(dcs; kwargs...) solve!(dmp, ddcs) return get_solver_return(get_objective(dmp), ddcs) @@ -371,10 +334,10 @@ end function step_solver!(amp::AbstractManoptProblem, dcs::DifferenceOfConvexState, kw) M = get_manifold(amp) get_subtrahend_gradient!(amp, dcs.X, dcs.p) - set_manopt_parameter!(dcs.sub_problem, :Objective, :Cost, :p, dcs.p) - set_manopt_parameter!(dcs.sub_problem, :Objective, :Cost, :X, dcs.X) - set_manopt_parameter!(dcs.sub_problem, :Objective, :Gradient, :p, dcs.p) - set_manopt_parameter!(dcs.sub_problem, :Objective, :Gradient, :X, dcs.X) + set_parameter!(dcs.sub_problem, :Objective, :Cost, :p, dcs.p) + set_parameter!(dcs.sub_problem, :Objective, :Cost, :X, dcs.X) + set_parameter!(dcs.sub_problem, :Objective, :Gradient, :p, dcs.p) + set_parameter!(dcs.sub_problem, :Objective, :Gradient, :X, dcs.X) set_iterate!(dcs.sub_state, M, copy(M, dcs.p)) solve!(dcs.sub_problem, dcs.sub_state) # call the subsolver # copy result from subsolver to current iterate @@ -390,9 +353,9 @@ end # function step_solver!( amp::AbstractManoptProblem, - dcs::DifferenceOfConvexState{F,ClosedFormSubSolverState{InplaceEvaluation}}, + dcs::DifferenceOfConvexState{P,T,F,ClosedFormSubSolverState{InplaceEvaluation}}, i, -) where {F} +) where {P,T,F} M = get_manifold(amp) get_subtrahend_gradient!(amp, dcs.X, dcs.p) # evaluate grad F in place for O.X dcs.sub_problem(M, dcs.p, dcs.p, dcs.X) # evaluate the closed form solution and store the result in p @@ -403,9 +366,9 @@ end # function step_solver!( amp::AbstractManoptProblem, - dcs::DifferenceOfConvexState{F,ClosedFormSubSolverState{AllocatingEvaluation}}, + dcs::DifferenceOfConvexState{P,T,F,ClosedFormSubSolverState{AllocatingEvaluation}}, i, -) where {F} +) where {P,T,F} M = get_manifold(amp) get_subtrahend_gradient!(amp, dcs.X, dcs.p) # evaluate grad F in place for O.X # run the subsolver in-place of a copy of the current iterate and copy it back to the current iterate diff --git a/src/solvers/exact_penalty_method.jl b/src/solvers/exact_penalty_method.jl index 92cf0a914e..0b657b9064 100644 --- a/src/solvers/exact_penalty_method.jl +++ b/src/solvers/exact_penalty_method.jl @@ -7,11 +7,11 @@ Describes the exact penalty method, with * `ϵ`: the accuracy tolerance * `ϵ_min`: the lower bound for the accuracy tolerance -* $(_field_p) +$(_var(:Field, :p; add=[:as_Iterate])) * `ρ`: the penalty parameter -* $(_field_sub_problem) -* $(_field_sub_state) -* $(_field_stop) +$(_var(:Field, :sub_problem)) +$(_var(:Field, :sub_state)) +$(_var(:Field, :stopping_criterion, "stop")) * `u`: the smoothing parameter and threshold for violation of the constraints * `u_min`: the lower bound for the smoothing parameter and threshold for violation of the constraints * `θ_ϵ`: the scaling factor of the tolerance parameter @@ -20,9 +20,15 @@ Describes the exact penalty method, with # Constructor - ExactPenaltyMethodState(M::AbstractManifold, p, sub_problem, sub_state; kwargs...) + ExactPenaltyMethodState(M::AbstractManifold, sub_problem, sub_state; kwargs...) -construct an exact penalty state. +construct the exact penalty state. + + ExactPenaltyMethodState(M::AbstractManifold, sub_problem; + evaluation=AllocatingEvaluation(), kwargs... +) + +construct the exact penalty state, where `sub_problem` is a closed form solution with `evaluation` as type of evaluation. # Keyword arguments @@ -34,10 +40,11 @@ construct an exact penalty state. * `u_min=1e-6` * `u_exponent=1 / 100`: a shortcut for the scaling factor ``θ_u``. * `θ_u=(u_min / u)^(u_exponent)` +$(_var(:Keyword, :p; add=:as_Initial)) * `ρ=1.0` * `θ_ρ=0.3` -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(300)`$(_sc_any)` (` - [`StopWhenSmallerOrEqual`](@ref)`(:ϵ, ϵ_min)`$(_sc_any)[`StopWhenChangeLess`](@ref)`(1e-10) )` +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(300)`$(_sc(:Any))` (`")) + [`StopWhenSmallerOrEqual`](@ref)`(:ϵ, ϵ_min)`$(_sc(:Any))[`StopWhenChangeLess`](@ref)`(1e-10) )` # See also @@ -63,10 +70,10 @@ mutable struct ExactPenaltyMethodState{ θ_ϵ::R stop::TStopping function ExactPenaltyMethodState( - ::AbstractManifold, - p::P, + M::AbstractManifold, sub_problem::Pr, - sub_state::Union{AbstractEvaluationType,AbstractManoptSolverState}; + sub_state::St; + p::P=rand(M), ϵ::R=1e-3, ϵ_min::R=1e-6, ϵ_exponent=1 / 100, @@ -78,9 +85,15 @@ mutable struct ExactPenaltyMethodState{ ρ::R=1.0, θ_ρ::R=0.3, stopping_criterion::SC=StopAfterIteration(300) | ( - StopWhenSmallerOrEqual(:ϵ, ϵ_min) | StopWhenChangeLess(1e-10) + StopWhenSmallerOrEqual(:ϵ, ϵ_min) | StopWhenChangeLess(M, 1e-10) ), - ) where {P,Pr<:Union{F,AbstractManoptProblem} where {F},R<:Real,SC<:StoppingCriterion} + ) where { + P, + Pr<:Union{F,AbstractManoptProblem} where {F}, + St<:AbstractManoptSolverState, + R<:Real, + SC<:StoppingCriterion, + } sub_state_storage = maybe_wrap_evaluation_type(sub_state) epms = new{P,Pr,typeof(sub_state_storage),R,SC}() epms.p = p @@ -98,6 +111,13 @@ mutable struct ExactPenaltyMethodState{ return epms end end +function ExactPenaltyMethodState( + M::AbstractManifold, sub_problem; evaluation::E=AllocatingEvaluation(), kwargs... +) where {E<:AbstractEvaluationType} + cfs = ClosedFormSubSolverState(; evaluation=evaluation) + return ExactPenaltyMethodState(M, sub_problem, cfs; kwargs...) +end + get_iterate(epms::ExactPenaltyMethodState) = epms.p function get_message(epms::ExactPenaltyMethodState) # for now only the sub solver might have messages @@ -168,9 +188,9 @@ _doc_EPM = """ perform the exact penalty method (EPM) [LiuBoumal:2019](@cite) The aim of the EPM is to find a solution of the constrained optimisation task -$(_problem_constrained) +$(_problem(:Constrained)) -where `M` is a Riemannian manifold, and ``f``, ``$(_math_sequence("g", "i", "1", "n"))`` and ``$(_math_sequence("h", "j", "1", "m"))`` +where `M` is a Riemannian manifold, and ``f``, ``$(_math(:Sequence, "g", "i", "1", "n"))`` and ``$(_math(:Sequence, "h", "j", "1", "m"))`` are twice continuously differentiable functions from `M` to ℝ. For that a weighted ``L_1``-penalty term for the violation of the constraints is added to the objective @@ -179,7 +199,7 @@ $(_doc_EPM_penalty) Since this is non-smooth, a [`SmoothingTechnique`](@ref) with parameter `u` is applied, see the [`ExactPenaltyCost`](@ref). -In every step ``k`` of the exact penalty method, the smoothed objective is then minimized over all ``p ∈$(_l_M)``. +In every step ``k`` of the exact penalty method, the smoothed objective is then minimized over all ``p ∈$(_math(:M))``. Then, the accuracy tolerance ``ϵ`` and the smoothing parameter ``u`` are updated by setting $(_doc_EMP_ϵ_update) @@ -192,10 +212,10 @@ $(_doc_EMP_ρ_update) # Input -* $(_arg_M) -* $(_arg_f) -* $(_arg_grad_f) -* $(_arg_p) +$(_var(:Argument, :M; type=true)) +$(_var(:Argument, :f)) +$(_var(:Argument, :grad_f)) +$(_var(:Argument, :p)) # Keyword arguments if not called with the [`ConstrainedManifoldObjective`](@ref) `cmo` @@ -229,22 +249,22 @@ Otherwise the problem is not constrained and a better solver would be for exampl * `min_stepsize=1e-10`: the minimal step size * `smoothing=`[`LogarithmicSumOfExponentials`](@ref): a [`SmoothingTechnique`](@ref) to use * `sub_cost=`[`ExactPenaltyCost`](@ref)`(problem, ρ, u; smoothing=smoothing)`: cost to use in the sub solver - $(_kw_used_in("sub_problem")) + $(_note(:KeywordUsedIn, "sub_problem")) * `sub_grad=`[`ExactPenaltyGrad`](@ref)`(problem, ρ, u; smoothing=smoothing)`: gradient to use in the sub solver - $(_kw_used_in("sub_problem")) -* * $(_kw_sub_kwargs_default): $(_kw_sub_kwargs) -* `sub_stopping_criterion=`[`StopAfterIteration`](@ref)`(200)`$(_sc_any)[`StopWhenGradientNormLess`](@ref)`(ϵ)`$(_sc_any)[`StopWhenStepsizeLess`](@ref)`(1e-10)`: a stopping cirterion for the sub solver - $(_kw_used_in("sub_state")) -* `sub_problem=`[`DefaultManoptProblem`](@ref)`(M, `[`ManifoldGradientObjective`](@ref)`(sub_cost, sub_grad; evaluation=evaluation)`: the problem for the subsolver. The objective can also be decorated with argumens from `sub_kwargs`. -* `sub_state=`[`QuasiNewtonState`](@ref)`(...)` a solver to use for the sub problem. By default an L-BFGS is used. -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(300)`$(_sc_any)` ( `[`StopWhenSmallerOrEqual`](@ref)`(ϵ, ϵ_min)`$(_sc_all)[`StopWhenChangeLess`](@ref)`(1e-10) )`: $(_kw_stopping_criterion) + $(_note(:KeywordUsedIn, "sub_problem")) +* $(_var(:Keyword, :sub_kwargs)) +* `sub_stopping_criterion=`[`StopAfterIteration`](@ref)`(200)`$(_sc(:Any))[`StopWhenGradientNormLess`](@ref)`(ϵ)`$(_sc(:Any))[`StopWhenStepsizeLess`](@ref)`(1e-10)`: a stopping cirterion for the sub solver + $(_note(:KeywordUsedIn, "sub_state")) +$(_var(:Keyword, :sub_state; default="[`DefaultManoptProblem`](@ref)`(M, `[`ManifoldGradientObjective`](@ref)`(sub_cost, sub_grad; evaluation=evaluation)")) +$(_var(:Keyword, :sub_state; default="[`QuasiNewtonState`](@ref)", add=" where [`QuasiNewtonLimitedMemoryDirectionUpdate`](@ref) with [`InverseBFGS`](@ref) is used")) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(300)`$(_sc(:Any))` ( `[`StopWhenSmallerOrEqual`](@ref)`(ϵ, ϵ_min)`$(_sc(:All))[`StopWhenChangeLess`](@ref)`(1e-10) )`")) For the `range`s of the constraints' gradient, other power manifold tangent space representations, mainly the [`ArrayPowerRepresentation`](@extref Manifolds :jl:type:`Manifolds.ArrayPowerRepresentation`) can be used if the gradients can be computed more efficiently in that representation. -$(_kw_others) +$(_note(:OtherKeywords)) -$_doc_sec_output +$(_note(:OutputSection)) """ @doc "$(_doc_EPM)" @@ -262,57 +282,40 @@ function exact_penalty_method( grad_g=nothing, grad_h=nothing, evaluation::AbstractEvaluationType=AllocatingEvaluation(), - inequality_constrains::Union{Integer,Nothing}=nothing, - equality_constrains::Union{Nothing,Integer}=nothing, + inequality_constraints::Union{Integer,Nothing}=nothing, + equality_constraints::Union{Nothing,Integer}=nothing, kwargs..., ) where {TF,TGF} + p_ = _ensure_mutating_variable(p) + f_ = _ensure_mutating_cost(f, p) + grad_f_ = _ensure_mutating_gradient(grad_f, p, evaluation) + g_ = _ensure_mutating_cost(g, p) + grad_g_ = _ensure_mutating_gradient(grad_g, p, evaluation) + h_ = _ensure_mutating_cost(h, p) + grad_h_ = _ensure_mutating_gradient(grad_h, p, evaluation) cmo = ConstrainedManifoldObjective( - f, - grad_f, - g, - grad_g, - h, - grad_h; + f_, + grad_f_, + g_, + grad_g_, + h_, + grad_h_; evaluation=evaluation, - equality_constrains=equality_constrains, - inequality_constrains=inequality_constrains, + equality_constraints=equality_constraints, + inequality_constraints=equality_constraints, M=M, - p=p, + p=p_, ) - return exact_penalty_method( + rs = exact_penalty_method( M, cmo, - p; + p_; evaluation=evaluation, - equality_constrains=equality_constrains, - inequality_constrains=inequality_constrains, + equality_constraints=equality_constraints, + inequality_constraints=inequality_constraints, kwargs..., ) -end -function exact_penalty_method( - M::AbstractManifold, - f, - grad_f, - p::Number; - g=nothing, - h=nothing, - grad_g=nothing, - grad_h=nothing, - evaluation::AbstractEvaluationType=AllocatingEvaluation(), - kwargs..., -) - q = [p] - f_(M, p) = f(M, p[]) - grad_f_ = _to_mutating_gradient(grad_f, evaluation) - g_ = isnothing(g) ? nothing : (M, p) -> g(M, p[]) - grad_g_ = isnothing(grad_g) ? nothing : _to_mutating_gradient(grad_g, evaluation) - h_ = isnothing(h) ? nothing : (M, p) -> h(M, p[]) - grad_h_ = isnothing(grad_h) ? nothing : _to_mutating_gradient(grad_h, evaluation) - cmo = ConstrainedManifoldObjective( - f_, grad_f_, g_, grad_g_, h_, grad_h_; evaluation=evaluation, M=M, p=p - ) - rs = exact_penalty_method(M, cmo, q; evaluation=evaluation, kwargs...) - return (typeof(q) == typeof(rs)) ? rs[] : rs + return _ensure_matching_output(p, rs) end function exact_penalty_method( M::AbstractManifold, cmo::O, p=rand(M); kwargs... @@ -333,15 +336,15 @@ function exact_penalty_method!( grad_g=nothing, grad_h=nothing, evaluation::AbstractEvaluationType=AllocatingEvaluation(), - inequality_constrains=nothing, - equality_constrains=nothing, + inequality_constraints=nothing, + equality_constraints=nothing, kwargs..., ) - if isnothing(inequality_constrains) - inequality_constrains = _number_of_constraints(g, grad_g; M=M, p=p) + if isnothing(inequality_constraints) + inequality_constraints = _number_of_constraints(g, grad_g; M=M, p=p) end - if isnothing(equality_constrains) - equality_constrains = _number_of_constraints(h, grad_h; M=M, p=p) + if isnothing(equality_constraints) + equality_constraints = _number_of_constraints(h, grad_h; M=M, p=p) end cmo = ConstrainedManifoldObjective( f, @@ -351,8 +354,8 @@ function exact_penalty_method!( h, grad_h; evaluation=evaluation, - equality_constrains=equality_constrains, - inequality_constrains=inequality_constrains, + equality_constraints=equality_constraints, + inequality_constraints=inequality_constraints, M=M, p=p, ) @@ -361,8 +364,8 @@ function exact_penalty_method!( cmo, p; evaluation=evaluation, - equality_constrains=equality_constrains, - inequality_constrains=inequality_constrains, + equality_constraints=equality_constraints, + inequality_constraints=inequality_constraints, kwargs..., ) end @@ -403,8 +406,8 @@ function exact_penalty_method!( StopWhenStepsizeLess(1e-8), sub_state::Union{AbstractEvaluationType,AbstractManoptSolverState}=decorate_state!( QuasiNewtonState( - M, - copy(M, p); + M; + p=copy(M, p), initial_vector=zero_vector(M, p), direction_update=QuasiNewtonLimitedMemoryDirectionUpdate( M, copy(M, p), InverseBFGS(), 30 @@ -416,7 +419,7 @@ function exact_penalty_method!( sub_kwargs..., ), stopping_criterion::StoppingCriterion=StopAfterIteration(300) | ( - StopWhenSmallerOrEqual(:ϵ, ϵ_min) & StopWhenChangeLess(1e-10) + StopWhenSmallerOrEqual(:ϵ, ϵ_min) & StopWhenChangeLess(M, 1e-10) ), kwargs..., ) where { @@ -426,9 +429,9 @@ function exact_penalty_method!( sub_state_storage = maybe_wrap_evaluation_type(sub_state) emps = ExactPenaltyMethodState( M, - p, sub_problem, sub_state_storage; + p=p, ϵ=ϵ, ϵ_min=ϵ_min, u=u, @@ -465,12 +468,12 @@ function step_solver!( ) where {P} M = get_manifold(amp) # use subsolver to minimize the smoothed penalized function - set_manopt_parameter!(epms.sub_problem, :Objective, :Cost, :ρ, epms.ρ) - set_manopt_parameter!(epms.sub_problem, :Objective, :Cost, :u, epms.u) - set_manopt_parameter!(epms.sub_problem, :Objective, :Gradient, :ρ, epms.ρ) - set_manopt_parameter!(epms.sub_problem, :Objective, :Gradient, :u, epms.u) + set_parameter!(epms.sub_problem, :Objective, :Cost, :ρ, epms.ρ) + set_parameter!(epms.sub_problem, :Objective, :Cost, :u, epms.u) + set_parameter!(epms.sub_problem, :Objective, :Gradient, :ρ, epms.ρ) + set_parameter!(epms.sub_problem, :Objective, :Gradient, :u, epms.u) set_iterate!(epms.sub_state, M, copy(M, epms.p)) - update_stopping_criterion!(epms, :MinIterateChange, epms.ϵ) + set_parameter!(epms, :StoppingCriterion, :MinIterateChange, epms.ϵ) epms.p = get_solver_result(solve!(epms.sub_problem, epms.sub_state)) diff --git a/src/solvers/gradient_descent.jl b/src/solvers/gradient_descent.jl index fc74d68719..66f36764fd 100644 --- a/src/solvers/gradient_descent.jl +++ b/src/solvers/gradient_descent.jl @@ -6,32 +6,32 @@ Describes the state of a gradient based descent algorithm. # Fields -* $_field_iterate -* $_field_gradient -* $_field_stop -* $_field_step +$(_var(:Field, :p; add=[:as_Iterate])) +$(_var(:Field, :X; add=[:as_Gradient])) +$(_var(:Field, :stopping_criterion, "stop")) +$(_var(:Field, :stepsize)) * `direction::`[`DirectionUpdateRule`](@ref) : a processor to handle the obtained gradient and compute a direction to “walk into”. -* $_field_retr +$(_var(:Field, :retraction_method)) # Constructor - GradientDescentState(M, p=rand(M); kwargs...) + GradientDescentState(M::AbstractManifold; kwargs...) Initialize the gradient descent solver state, where ## Input -$_arg_M -$_arg_p +$(_var(:Argument, :M; type=true)) ## Keyword arguments * `direction=`[`IdentityUpdateRule`](@ref)`()` -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(100)` $_kw_stop_note -* `stepsize=`[`default_stepsize`](@ref)`(M, GradientDescentState; retraction_method=retraction_method)` -* $_kw_retraction_method_default -* $_kw_X_default +$(_var(:Keyword, :p; add=:as_Initial)) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(100)`")) +$(_var(:Keyword, :stepsize; default="[`default_stepsize`](@ref)`(M, GradientDescentState; retraction_method=retraction_method)`")) +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :X; add=:as_Memory)) # See also @@ -51,40 +51,27 @@ mutable struct GradientDescentState{ stepsize::TStepsize stop::TStop retraction_method::TRTM - function GradientDescentState{P,T}( - M::AbstractManifold, - p::P, - X::T, - stop::StoppingCriterion=StopAfterIteration(100), - step::Stepsize=default_stepsize(M, GradientDescentState), - retraction_method::AbstractRetractionMethod=default_retraction_method(M, typeof(p)), - direction::DirectionUpdateRule=IdentityUpdateRule(), - ) where {P,T} - s = new{P,T,typeof(stop),typeof(step),typeof(direction),typeof(retraction_method)}() - s.direction = direction - s.p = p - s.retraction_method = retraction_method - s.stepsize = step - s.stop = stop - s.X = X - return s - end end - function GradientDescentState( - M::AbstractManifold, - p::P=rand(M); + M::AbstractManifold; + p::P=rand(M), X::T=zero_vector(M, p), - stopping_criterion::StoppingCriterion=StopAfterIteration(200) | - StopWhenGradientNormLess(1e-8), - retraction_method::AbstractRetractionMethod=default_retraction_method(M, typeof(p)), - stepsize::Stepsize=default_stepsize( + stopping_criterion::SC=StopAfterIteration(200) | StopWhenGradientNormLess(1e-8), + retraction_method::RTM=default_retraction_method(M, typeof(p)), + stepsize::S=default_stepsize( M, GradientDescentState; retraction_method=retraction_method ), - direction::DirectionUpdateRule=IdentityUpdateRule(), -) where {P,T} - return GradientDescentState{P,T}( - M, p, X, stopping_criterion, stepsize, retraction_method, direction + direction::D=IdentityUpdateRule(), +) where { + P, + T, + SC<:StoppingCriterion, + RTM<:AbstractRetractionMethod, + S<:Stepsize, + D<:DirectionUpdateRule, +} + return GradientDescentState{P,T,SC,S,D,RTM}( + p, X, direction, stepsize, stopping_criterion, retraction_method ) end function (r::IdentityUpdateRule)(mp::AbstractManoptProblem, s::GradientDescentState, k) @@ -96,7 +83,9 @@ function default_stepsize( retraction_method=default_retraction_method(M), ) # take a default with a slightly defensive initial step size. - return ArmijoLinesearch(M; retraction_method=retraction_method, initial_stepsize=1.0) + return ArmijoLinesearchStepsize( + M; retraction_method=retraction_method, initial_stepsize=1.0 + ) end function get_message(gds::GradientDescentState) # for now only step size is quipped with messages @@ -142,12 +131,12 @@ The algorithm can be performed in-place of `p`. # Input -$_arg_M -$_arg_f -$_arg_grad_f -$_arg_p +$(_var(:Argument, :M; type=true)) +$(_var(:Argument, :f)) +$(_var(:Argument, :grad_f)) +$(_var(:Argument, :p)) -$_arg_alt_mgo +$(_note(:GradientObjective)) # Keyword arguments @@ -155,63 +144,39 @@ $_arg_alt_mgo specify to perform a certain processing of the direction, for example [`Nesterov`](@ref), [`MomentumGradient`](@ref) or [`AverageGradient`](@ref). -* $_kw_evaluation_default: - $_kw_evaluation $_kw_evaluation_example - -* $_kw_retraction_method_default: - $_kw_retraction_method - -* `stepsize=`[`default_stepsize`](@ref)`(M, GradientDescentState)`: - $_kw_stepsize - -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(200)`$_sc_any[`StopWhenGradientNormLess`](@ref)`(1e-8)`: - $_kw_stopping_criterion - -* $_kw_X_default: - $_kw_X, the evaluated gradient ``$_l_grad f`` evaluated at ``p^{(k)}``. +$(_var(:Keyword, :evaluation; add=:GradientExample)) +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :stepsize; default="[`default_stepsize`](@ref)`(M, GradientDescentState)`")) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(200)`$(_sc(:Any))[`StopWhenGradientNormLess`](@ref)`(1e-8)`")) +$(_var(:Keyword, :X; add=:as_Gradient)) -$_kw_others +$(_note(:OtherKeywords)) If you provide the [`ManifoldGradientObjective`](@ref) directly, the `evaluation=` keyword is ignored. The decorations are still applied to the objective. -$_doc_remark_tutorial_debug +$(_note(:TutorialMode)) -$_doc_sec_output +$(_note(:OutputSection)) """ @doc "$(_doc_gradient_descent)" gradient_descent(M::AbstractManifold, args...; kwargs...) -function gradient_descent(M::AbstractManifold, f, grad_f; kwargs...) - return gradient_descent(M, f, grad_f, rand(M); kwargs...) -end -function gradient_descent( - M::AbstractManifold, - f, - grad_f, - p; - evaluation::AbstractEvaluationType=AllocatingEvaluation(), - kwargs..., -) - mgo = ManifoldGradientObjective(f, grad_f; evaluation=evaluation) - return gradient_descent(M, mgo, p; kwargs...) -end function gradient_descent( M::AbstractManifold, f, grad_f, - p::Number; + p=rand(M); evaluation::AbstractEvaluationType=AllocatingEvaluation(), kwargs..., ) - # redefine initial point - q = [p] - f_(M, p) = f(M, p[]) - grad_f_ = _to_mutating_gradient(grad_f, evaluation) - rs = gradient_descent(M, f_, grad_f_, q; evaluation=evaluation, kwargs...) - #return just a number if the return type is the same as the type of q - return (typeof(q) == typeof(rs)) ? rs[] : rs + p_ = _ensure_mutating_variable(p) + f_ = _ensure_mutating_cost(f, p) + grad_f_ = _ensure_mutating_gradient(grad_f, p, evaluation) + mgo = ManifoldGradientObjective(f_, grad_f_; evaluation=evaluation) + rs = gradient_descent(M, mgo, p_; kwargs...) + return _ensure_matching_output(p, rs) end function gradient_descent( M::AbstractManifold, mgo::O, p; kwargs... @@ -238,13 +203,14 @@ function gradient_descent!( mgo::O, p; retraction_method::AbstractRetractionMethod=default_retraction_method(M, typeof(p)), - stepsize::Stepsize=default_stepsize( + stepsize::Union{Stepsize,ManifoldDefaultsFactory}=default_stepsize( M, GradientDescentState; retraction_method=retraction_method ), stopping_criterion::StoppingCriterion=StopAfterIteration(200) | StopWhenGradientNormLess(1e-8), debug=if is_tutorial_mode() - if (stepsize isa ConstantStepsize) + if (stepsize isa ManifoldDefaultsFactory{Manopt.ConstantStepsize}) + # If you pass the step size (internal) directly, this is considered expert mode [DebugWarnIfCostIncreases(), DebugWarnIfGradientNormTooLarge()] else [DebugWarnIfGradientNormTooLarge()] @@ -252,18 +218,18 @@ function gradient_descent!( else [] end, - direction=IdentityUpdateRule(), + direction=Gradient(), X=zero_vector(M, p), kwargs..., #collect rest ) where {O<:Union{AbstractManifoldGradientObjective,AbstractDecoratedManifoldObjective}} dmgo = decorate_objective!(M, mgo; kwargs...) dmp = DefaultManoptProblem(M, dmgo) s = GradientDescentState( - M, - p; + M; + p=p, stopping_criterion=stopping_criterion, - stepsize=stepsize, - direction=direction, + stepsize=_produce_type(stepsize, M), + direction=_produce_type(direction, M), retraction_method=retraction_method, X=X, ) diff --git a/src/solvers/interior_point_Newton.jl b/src/solvers/interior_point_Newton.jl index ac4df5f8c9..112774d569 100644 --- a/src/solvers/interior_point_Newton.jl +++ b/src/solvers/interior_point_Newton.jl @@ -3,7 +3,7 @@ _doc_IPN_subsystem = ```math \operatorname{J} F(p, μ, λ, s)[X, Y, Z, W] = -F(p, μ, λ, s), \text{ where } -""" * "X ∈ $(_l_TpM()), Y,W ∈ ℝ^m, Z ∈ ℝ^n,\n```\n" +""" * "X ∈ $(_math(:TpM)), Y,W ∈ ℝ^m, Z ∈ ℝ^n,\n```\n" _doc_IPN = """ interior_point_Newton(M, f, grad_f, Hess_f, p=rand(M); kwargs...) interior_point_Newton(M, cmo::ConstrainedManifoldObjective, p=rand(M); kwargs...) @@ -14,7 +14,7 @@ perform the interior point Newton method following [LaiYoshise:2024](@cite). In order to solve the constrained problem -$_problem_constrained +$(_problem(:Constrained)) This algorithms iteratively solves the linear system based on extending the KKT system by a slack variable `s`. @@ -37,11 +37,11 @@ the constraints are further fulfilled. # Input -* `M`: a manifold ``$(_l_M)`` -* `f`: a cost function ``f : $(_l_M) → ℝ`` to minimize -* `grad_f`: the gradient ``$(_l_grad) f : $(_l_M) → T $(_l_M)`` of ``f`` -* `Hess_f`: the Hessian ``$(_l_Hess)f(p): T_p$(_l_M) → T_p$(_l_M)``, ``X ↦ $(_l_Hess)f(p)[X] = ∇_X$(_l_grad)f(p)`` -* `p=$(_link_rand()): an initial value ``p ∈ $(_l_M)`` +$(_var(:Argument, :M)) +$(_var(:Argument, :f)) +$(_var(:Argument, :grad_f)) +$(_var(:Argument, :Hess_f)) +$(_var(:Argument, :p)) or a [`ConstrainedManifoldObjective`](@ref) `cmo` containing `f`, `grad_f`, `Hess_f`, and the constraints @@ -54,8 +54,7 @@ pass a [`ConstrainedManifoldObjective`](@ref) `cmo` This can be used to ensure that the resulting iterate is still an interior point if you provide a check `(N,q) -> true/false`, where `N` is the manifold of the `step_problem`. * `equality_constraints=nothing`: the number ``n`` of equality constraints. -* `evaluation=`[`AllocatingEvaluation`](@ref)`()`: - specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result ([`AllocatingEvaluation`](@ref)) or whether they modify their input argument to return the result therein ([`InplaceEvaluation`](@ref)). Since usually the first argument is the manifold, the modified argument is the second. +$(_var(:Keyword, :evaluation)) * `g=nothing`: the inequality constraints * `grad_g=nothing`: the gradient of the inequality constraints * `grad_h=nothing`: the gradient of the equality constraints @@ -68,28 +67,25 @@ pass a [`ConstrainedManifoldObjective`](@ref) `cmo` * `inequality_constraints=nothing`: the number ``m`` of inequality constraints. * `λ=ones(length(h(M, p)))`: the Lagrange multiplier with respect to the equality constraints ``h`` * `μ=ones(length(g(M, p)))`: the Lagrange multiplier with respect to the inequality constraints ``g`` -* `retraction_method=`[`default_retraction_method`](@extref `ManifoldsBase.default_retraction_method-Tuple{AbstractManifold}`)`(M, typeof(p))`: - the retraction to use, defaults to the default set `M` with respect to the representation for `p` chosen. +$(_var(:Keyword, :retraction_method)) * `ρ=μ's / length(μ)`: store the orthogonality `μ's/m` to compute the barrier parameter `β` in the sub problem. * `s=copy(μ)`: initial value for the slack variables * `σ=`[`calculate_σ`](@ref)`(M, cmo, p, μ, λ, s)`: scaling factor for the barrier parameter `β` in the sub problem, which is updated during the iterations * `step_objective`: a [`ManifoldGradientObjective`](@ref) of the norm of the KKT vector field [`KKTVectorFieldNormSq`](@ref) and its gradient [`KKTVectorFieldNormSqGradient`](@ref) -* `step_problem`: the manifold ``$(_l_M) × ℝ^m × ℝ^n × ℝ^m`` together with the `step_objective` +* `step_problem`: the manifold ``$(_math(:M)) × ℝ^m × ℝ^n × ℝ^m`` together with the `step_objective` as the problem the linesearch `stepsize=` employs for determining a step size * `step_state`: the [`StepsizeState`](@ref) with point and search direction -* `stepsize` an [`ArmijoLinesearch`](@ref) with the [`InteriorPointCentralityCondition`](@ref) as - additional condition to accept a step. Note that this step size operates on its own `step_problem`and `step_state` -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(200)`[` | `](@ref StopWhenAny)[`StopWhenKKTResidualLess`](@ref)`(1e-8)`: +$(_var(:Keyword, :stepsize; default="[`ArmijoLinesearch`](@ref)`()`", add=" with the `centrality_condtion` keyword as additional criterion to accept a step, if this is provided")) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(200)`[` | `](@ref StopWhenAny)[`StopWhenKKTResidualLess`](@ref)`(1e-8)`")) a stopping criterion, by default depending on the residual of the KKT vector field or a maximal number of steps, which ever hits first. * `sub_kwargs=(;)`: keyword arguments to decorate the sub options, for example debug, that automatically respects the main solvers debug options (like sub-sampling) as well * `sub_objective`: The [`SymmetricLinearSystemObjective`](@ref) modelling the system of equations to use in the sub solver, - includes the [`CondensedKKTVectorFieldJacobian`](@ref) ``$(_l_cal("A"))(X)`` and the [`CondensedKKTVectorField`](@ref) ``b`` in ``$(_l_cal("A"))(X) + b = 0`` we aim to solve. - $(_kw_used_in("sub_problem")) -* `sub_stopping_criterion=`[`StopAfterIteration`](@ref)`(manifold_dimension(M))`[` | `](@ref StopWhenAny)[`StopWhenRelativeResidualLess`](@ref)`(c,1e-8)`, where ``c = $(_l_norm("b"))`` from the system to solve. - $(_kw_used_in("sub_state")) -* `sub_problem`: combining the `sub_objective` and the tangent space at ``(p,λ)``` on the manifold ``$(_l_M) × ℝ^n`` to a manopt problem. - This is the manifold and objective for the sub solver. -* `sub_state=`[`ConjugateResidualState`](@ref): a state specifying the subsolver. This default is also decorated with the `sub_kwargs...`. + includes the [`CondensedKKTVectorFieldJacobian`](@ref) ``$(_tex(:Cal, "A"))(X)`` and the [`CondensedKKTVectorField`](@ref) ``b`` in ``$(_tex(:Cal, "A"))(X) + b = 0`` we aim to solve. + $(_note(:KeywordUsedIn, "sub_problem")) +* `sub_stopping_criterion=`[`StopAfterIteration`](@ref)`(manifold_dimension(M))`[` | `](@ref StopWhenAny)[`StopWhenRelativeResidualLess`](@ref)`(c,1e-8)`, where ``c = $(_tex(:norm,"b"))`` from the system to solve. + $(_note(:KeywordUsedIn, "sub_state")) +$(_var(:Keyword, :sub_problem; default="[`DefaultManoptProblem`](@ref)`(M, sub_objective)`")) +$(_var(:Keyword, :sub_state; default="[`ConjugateResidualState`](@ref)")) * `vector_space=`[`Rn`](@ref Manopt.Rn) a function that, given an integer, returns the manifold to be used for the vector space components ``ℝ^m,ℝ^n`` * `X=`[`zero_vector`](@extref `ManifoldsBase.zero_vector-Tuple{AbstractManifold, Any}`)`(M,p)`: th initial gradient with respect to `p`. @@ -134,7 +130,6 @@ function interior_point_Newton( equality_constrains::Union{Nothing,Integer}=nothing, kwargs..., ) - q = copy(M, p) cmo = ConstrainedManifoldObjective( f, grad_f, @@ -151,7 +146,7 @@ function interior_point_Newton( M=M, p=p, ) - return interior_point_Newton!(M, cmo, q; evaluation=evaluation, kwargs...) + return interior_point_Newton(M, cmo, p; evaluation=evaluation, kwargs...) end function interior_point_Newton( M::AbstractManifold, cmo::O, p; kwargs... @@ -176,8 +171,8 @@ function interior_point_Newton!( grad_h=nothing, Hess_g=nothing, Hess_h=nothing, - inequality_constrains=nothing, - equality_constrains=nothing, + inequality_constraints=nothing, + equality_constraints=nothing, kwargs..., ) cmo = ConstrainedManifoldObjective( @@ -191,21 +186,13 @@ function interior_point_Newton!( hess_g=Hess_g, hess_h=Hess_h, evaluation=evaluation, - equality_constrains=equality_constrains, - inequality_constrains=inequality_constrains, + equality_constraints=equality_constraints, + inequality_constraints=inequality_constraints, M=M, p=p, ) dcmo = decorate_objective!(M, cmo; kwargs...) - return interior_point_Newton!( - M, - dcmo, - p; - evaluation=evaluation, - equality_constrains=equality_constrains, - inequality_constrains=inequality_constrains, - kwargs..., - ) + return interior_point_Newton!(M, dcmo, p; evaluation=evaluation, kwargs...) end function interior_point_Newton!( M::AbstractManifold, @@ -234,7 +221,7 @@ function interior_point_Newton!( step_problem=DefaultManoptProblem(_step_M, step_objective), _step_p=rand(_step_M), step_state=StepsizeState(_step_p, zero_vector(_step_M, _step_p)), - stepsize::Stepsize=ArmijoLinesearch( + stepsize::Union{Stepsize,ManifoldDefaultsFactory}=ArmijoLinesearch( _step_M; retraction_method=default_retraction_method(_step_M), initial_guess=interior_point_initial_guess, @@ -284,9 +271,9 @@ function interior_point_Newton!( ips = InteriorPointNewtonState( M, cmo, - p, sub_problem, sub_state; + p=p, X=X, Y=Y, Z=Z, @@ -298,14 +285,17 @@ function interior_point_Newton!( retraction_method=retraction_method, step_problem=step_problem, step_state=step_state, - stepsize=stepsize, + stepsize=_produce_type(stepsize, _step_M), kwargs..., ) ips = decorate_state!(ips; kwargs...) solve!(dmp, ips) return get_solver_return(get_objective(dmp), ips) end -function initialize_solver!(::AbstractManoptProblem, ips::InteriorPointNewtonState) +function initialize_solver!(amp::AbstractManoptProblem, ips::InteriorPointNewtonState) + M = get_manifold(amp) + cmo = get_objective(amp) + !is_feasible(M, cmo, ips.p; error=:error) return ips end @@ -318,11 +308,11 @@ function step_solver!(amp::AbstractManoptProblem, ips::InteriorPointNewtonState, copyto!(N[2], q[N, 2], ips.λ) set_iterate!(ips.sub_state, get_manifold(ips.sub_problem), zero_vector(N, q)) - set_manopt_parameter!(ips.sub_problem, :Manifold, :Basepoint, q) - set_manopt_parameter!(ips.sub_problem, :Objective, :μ, ips.μ) - set_manopt_parameter!(ips.sub_problem, :Objective, :λ, ips.λ) - set_manopt_parameter!(ips.sub_problem, :Objective, :s, ips.s) - set_manopt_parameter!(ips.sub_problem, :Objective, :β, ips.ρ * ips.σ) + set_parameter!(ips.sub_problem, :Manifold, :Basepoint, q) + set_parameter!(ips.sub_problem, :Objective, :μ, ips.μ) + set_parameter!(ips.sub_problem, :Objective, :λ, ips.λ) + set_parameter!(ips.sub_problem, :Objective, :s, ips.s) + set_parameter!(ips.sub_problem, :Objective, :β, ips.ρ * ips.σ) # product manifold on which to perform linesearch X2 = get_solver_result(solve!(ips.sub_problem, ips.sub_state)) @@ -356,11 +346,11 @@ function step_solver!(amp::AbstractManoptProblem, ips::InteriorPointNewtonState, (m > 0) && (copyto!(N[4], X[N, 4], ips.W)) set_gradient!(ips.step_state, M, q, X) # Update centrality factor – Maybe do this as an update function? - γ = get_manopt_parameter(ips.stepsize, :DecreaseCondition, :γ) + γ = get_parameter(ips.stepsize, :DecreaseCondition, :γ) if !isnothing(γ) - set_manopt_parameter!(ips.stepsize, :DecreaseCondition, :γ, (γ + 0.5) / 2) + set_parameter!(ips.stepsize, :DecreaseCondition, :γ, (γ + 0.5) / 2) end - set_manopt_parameter!(ips.stepsize, :DecreaseCondition, :τ, N, q) + set_parameter!(ips.stepsize, :DecreaseCondition, :τ, N, q) # determine stepsize α = ips.stepsize(ips.step_problem, ips.step_state, k) # Update Parameters and slack diff --git a/src/solvers/particle_swarm.jl b/src/solvers/particle_swarm.jl index 63648e2a7e..b32199394f 100644 --- a/src/solvers/particle_swarm.jl +++ b/src/solvers/particle_swarm.jl @@ -10,21 +10,21 @@ Describes a particle swarm optimizing algorithm, with * `cognitive_weight`: a cognitive weight factor * `inertia`: the inertia of the particles -* $(_field_inv_retr) -* $(_field_retr) +$(_var(:Field, :inverse_retraction_method)) +$(_var(:Field, :retraction_method)) * `social_weight`: a social weight factor -* $(_field_stop) -* $(_field_vector_transp) +$(_var(:Field, :stopping_criterion, "stop")) +$(_var(:Field, :vector_transport_method)) * `velocity`: a set of tangent vectors (of type `AbstractVector{T}`) representing the velocities of the particles # Internal and temporary fields * `cognitive_vector`: temporary storage for a tangent vector related to `cognitive_weight` -* `p`: storage for the best point ``p`` visited by all particles. +$(_var(:Field, :p; add=" storing the best point visited by all particles")) * `positional_best`: storing the best position ``p_i`` every single swarm participant visited -* `q`: temporary storage for a point to avoid allocations during a step of the algorithm +$(_var(:Field, :p, "q"; add=" serving as temporary storage for interims results; avoids allocations")) * `social_vec`: temporary storage for a tangent vector related to `social_weight` -* `swarm`: a set of points (of type `AbstractVector{P}`) on a manifold ``$(_math_sequence("a","i","1","N"))`` +* `swarm`: a set of points (of type `AbstractVector{P}`) on a manifold ``$(_math(:Sequence, "a","i","1","N"))`` # Constructor @@ -37,11 +37,11 @@ The `p` used in the following defaults is the type of one point from the swarm. * `cognitive_weight=1.4` * `inertia=0.65` -* `inverse_retraction_method=default_inverse_retraction_method(M, eltype(swarm))`: an inverse retraction to use. -* $(_kw_retraction_method_default) +$(_var(:Keyword, :inverse_retraction_method)) +$(_var(:Keyword, :retraction_method)) * `social_weight=1.4` -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(500)`$(_sc_any)[`StopWhenChangeLess`](@ref)`(1e-4)` -* $(_kw_vector_transport_method_default) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(500)`$(_sc(:Any))[`StopWhenChangeLess`](@ref)`(1e-4)`")) +$(_var(:Keyword, :vector_transport_method)) # See also @@ -80,7 +80,7 @@ mutable struct ParticleSwarmState{ inertia=0.65, social_weight=1.4, cognitive_weight=1.4, - stopping_criterion::SCT=StopAfterIteration(500) | StopWhenChangeLess(1e-4), + stopping_criterion::SCT=StopAfterIteration(500) | StopWhenChangeLess(M, 1e-4), retraction_method::RTM=default_retraction_method(M, eltype(swarm)), inverse_retraction_method::IRM=default_inverse_retraction_method(M, eltype(swarm)), vector_transport_method::VTM=default_vector_transport_method(M, eltype(swarm)), @@ -143,10 +143,10 @@ function set_iterate!(pss::ParticleSwarmState, p) pss.p = p return pss end -function set_manopt_parameter!(pss::ParticleSwarmState, ::Val{:Population}, swarm) +function set_parameter!(pss::ParticleSwarmState, ::Val{:Population}, swarm) return pss.swarm = swarm end -function get_manopt_parameter(pss::ParticleSwarmState, ::Val{:Population}) +function get_parameter(pss::ParticleSwarmState, ::Val{:Population}) return pss.swarm end @@ -154,11 +154,7 @@ end # Constructors # _doc_swarm = raw"``S = \{s_1, \ldots, s_n\}``" -_doc_velocities = raw""" -```math - X_k^{(i)} = ω \, \operatorname{T}_{s_k^{(i)}\gets s_k^{(i-1)}}X_k^{(i-1)} + c r_1 \operatorname{retr}_{s_k^{(i)}}^{-1}(p_k^{(i)}) + s r_2 \operatorname{retr}_{s_k^{(i)}}^{-1}(p), -``` -""" + _doc_particle_update = raw""" ```math s_k^{(i+1)} = \operatorname{retr}_{s_k^{(i)}}(X_k^{(i)}), @@ -191,7 +187,7 @@ _doc_PSO = """ perform the particle swarm optimization algorithm (PSO) to solve -$(_problem_default) +$(_problem(:Default)) PSO starts with an initial `swarm` [BorckmansIshtevaAbsil:2010](@cite) of points on the manifold. @@ -200,14 +196,19 @@ The computation can be perfomed in-place of `swarm`. To this end, a swarm $_doc_swarm of particles is moved around the manifold `M` in the following manner. For every particle ``s_k^{(i)}`` the new particle velocities ``X_k^{(i)}`` are computed in every step ``i`` of the algorithm by -$_doc_velocities + +```math +X_k^{(i)} = ω $(_math(:vector_transport, :symbol, "s_k^{(i)", "s_k^{(i-1)}")) X_k^{(i-1)} + c r_1 $(_tex(:invretr))_{s_k^{(i)}}(p_k^{(i)}) + s r_2 $(_tex(:invretr))_{s_k^{(i)}}(p), +``` + where * ``s_k^{(i)}`` is the current particle position, * ``ω`` denotes the inertia, * ``c`` and ``s`` are a cognitive and a social weight, respectively, * ``r_j``, ``j=1,2`` are random factors which are computed new for each particle and step -* $_math_VT and $_math_inv_retr +* $(_math(:vector_transport, :symbol)) is a vector transport, and +* $(_tex(:invretr)) is an inverse retraction Then the position of the particle is updated as @@ -223,8 +224,8 @@ $_doc_swarm_global_best # Input -$_arg_M -$_arg_f +$(_var(:Argument, :M; type=true)) +$(_var(:Argument, :f)) * `swarm = [rand(M) for _ in 1:swarm_size]`: an initial swarm of points. Instead of a cost function `f` you can also provide an [`AbstractManifoldCostObjective`](@ref) `mco`. @@ -233,56 +234,37 @@ Instead of a cost function `f` you can also provide an [`AbstractManifoldCostObj * `cognitive_weight=1.4`: a cognitive weight factor * `inertia=0.65`: the inertia of the particles -* `inverse_retraction_method=default_inverse_retraction_method(M, eltype(swarm))`: an inverse retraction to use. -* $(_kw_retraction_method_default): $(_kw_retraction_method) +$(_var(:Keyword, :inverse_retraction_method)) +$(_var(:Keyword, :retraction_method)) * `social_weight=1.4`: a social weight factor * `swarm_size=100`: swarm size, if it should be generated randomly -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(500)`$(_sc_any)[`StopWhenChangeLess`](@ref)`(1e-4)`: - $(_kw_stopping_criterion) -* $(_kw_vector_transport_method_default): $(_kw_vector_transport_method) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(500)`$(_sc(:Any))[`StopWhenChangeLess`](@ref)`(1e-4)`")) +$(_var(:Keyword, :vector_transport_method)) * `velocity`: a set of tangent vectors (of type `AbstractVector{T}`) representing the velocities of the particles, per default a random tangent vector per initial position -$(_kw_others) +$(_note(:OtherKeywords)) If you provide the objective directly, these decorations can still be specified -$(_doc_sec_output) +$(_note(:OutputSection)) """ - @doc "$(_doc_PSO)" -function particle_swarm( - M::AbstractManifold, - f; - n=nothing, - swarm_size=isnothing(n) ? 100 : n, - x0=nothing, - kwargs..., -) - !isnothing(n) && (@warn "The keyword `n` is deprecated, use `swarm_size` instead") - !isnothing(x0) && - (@warn "The keyword `x0` is deprecated, use `particle_swarm(M, f, x0)` instead") - return particle_swarm( - M, f, isnothing(x0) ? [rand(M) for _ in 1:swarm_size] : x0; kwargs... - ) -end -function particle_swarm(M::AbstractManifold, f, swarm::AbstractVector; kwargs...) - mco = ManifoldCostObjective(f) - return particle_swarm(M, mco, swarm; kwargs...) +function particle_swarm(M::AbstractManifold, f; swarm_size=100, kwargs...) + return particle_swarm(M, f, [rand(M) for _ in 1:swarm_size]; kwargs...) end function particle_swarm( M::AbstractManifold, f, - swarm::AbstractVector{T}; + swarm::AbstractVector; velocity::AbstractVector=[rand(M; vector_at=y) for y in swarm], kwargs..., -) where {T<:Number} - f_(M, p) = f(M, p[]) - swarm_ = [[s] for s in swarm] - velocity_ = [[v] for v in velocity] - rs = particle_swarm(M, f_, swarm_; velocity=velocity_, kwargs...) - #return just a number if the return type is the same as the type of q - return (typeof(swarm_[1]) == typeof(rs)) ? rs[] : rs +) + f_ = _ensure_mutating_cost(f, first(swarm)) + swarm_ = [_ensure_mutating_variable(s) for s in swarm] + velocity_ = [_ensure_mutating_variable(v) for v in velocity] + mco = ManifoldCostObjective(f_) + rs = particle_swarm(M, mco, swarm_; velocity=velocity_, kwargs...) + return _ensure_matching_output(first(swarm), rs) end - function particle_swarm( M::AbstractManifold, mco::O, swarm::AbstractVector; kwargs... ) where {O<:Union{AbstractManifoldCostObjective,AbstractDecoratedManifoldObjective}} diff --git a/src/solvers/primal_dual_semismooth_Newton.jl b/src/solvers/primal_dual_semismooth_Newton.jl index 39aac03a3f..d382b22da4 100644 --- a/src/solvers/primal_dual_semismooth_Newton.jl +++ b/src/solvers/primal_dual_semismooth_Newton.jl @@ -4,7 +4,7 @@ Given a `cost` function ``\mathcal E: \mathcal M → \overline{ℝ}`` of the for \mathcal E(p) = F(p) + G( Λ(p) ), ``` where ``F: \mathcal M → \overline{ℝ}``, ``G: \mathcal N → \overline{ℝ}``, -and ``\Lambda: \mathcal M → \mathcal N``. The remaining input parameters are +and ``Λ: \mathcal M → \mathcal N``. The remaining input parameters are """ _doc_PDSN = """ @@ -14,35 +14,34 @@ Perform the Primal-Dual Riemannian semismooth Newton algorithm. $(_doc_PDSN_formula) -* `p, X`: primal and dual start points ``p∈$(_l_M)`` and ``X ∈ T_n$(_l_Manifold("N"))`` -* `m,n`: base points on ``$(_l_M)`` and ``$(_l_Manifold("N"))`, respectively. +* `p, X`: primal and dual start points ``p∈$(_math(:M))`` and ``X ∈ T_n$(_tex(:Cal, "N"))`` +* `m,n`: base points on ``$(_math(:M))`` and ``$(_tex(:Cal, "N"))`, respectively. * `linearized_forward_operator`: the linearization ``DΛ(⋅)[⋅]`` of the operator ``Λ(⋅)``. -* `adjoint_linearized_operator`: the adjoint ``DΛ^*`` of the linearized operator ``DΛ(m): $(_l_TpM("m")) → T_{Λ(m)}$(_l_Manifold("N"))`` -* `prox_F, prox_G_Dual`: the proximal maps of ``F`` and ``G^\\ast_n`` -* `diff_prox_F, diff_prox_dual_G`: the (Clarke Generalized) differentials of the proximal maps of ``F`` and ``G^\\ast_n`` +* `adjoint_linearized_operator`: the adjoint ``DΛ^*`` of the linearized operator ``DΛ(m): $(_math(:TpM; p="m")) → $(_math(:TpM; M="N", p="Λ(m)"))`` +* `prox_F, prox_G_Dual`: the proximal maps of ``F`` and ``G^$(_tex(:ast))_n`` +* `diff_prox_F, diff_prox_dual_G`: the (Clarke Generalized) differentials of the proximal maps of ``F`` and ``G^$(_tex(:ast))_n`` For more details on the algorithm, see [DiepeveenLellmann:2021](@cite). # Keyword arguments * `dual_stepsize=1/sqrt(8)`: proximal parameter of the dual prox -* $(_kw_evaluation_default): $(_kw_evaluation) -* $(_kw_inverse_retraction_method_default): $(_kw_inverse_retraction_method) +$(_var(:Keyword, :evaluation)) +$(_var(:Keyword, :inverse_retraction_method)) * `Λ=missing`: the exact operator, that is required if `Λ(m)=n` does not hold; `missing` indicates, that the forward operator is exact. * `primal_stepsize=1/sqrt(8)`: proximal parameter of the primal prox * `reg_param=1e-5`: regularisation parameter for the Newton matrix Note that this changes the arguments the `forward_operator` is called. -* $(_kw_retraction_method_default): $(_kw_retraction_method) -* `stopping_criterion=[`StopAfterIteration`](@ref)`(50)`: - $(_kw_stopping_criterion) +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(50)`")) * `update_primal_base=missing`: function to update `m` (identity by default/missing) * `update_dual_base=missing`: function to update `n` (identity by default/missing) -* $(_kw_vector_transport_method_default): $(_kw_vector_transport_method) +$(_var(:Keyword, :vector_transport_method)) -$(_kw_others) +$(_note(:OtherKeywords)) -$(_doc_sec_output) +$(_note(:OutputSection)) """ @doc "$(_doc_PDSN)" @@ -137,11 +136,11 @@ function primal_dual_semismooth_Newton!( dpdmsno = decorate_objective!(M, pdmsno; kwargs...) tmp = TwoManifoldProblem(M, N, dpdmsno) pdsn = PrimalDualSemismoothNewtonState( - M, - m, - n, - p, - X; + M; + m=m, + n=n, + p=p, + X=X, primal_stepsize=primal_stepsize, dual_stepsize=dual_stepsize, regularization_parameter=reg_param, diff --git a/src/solvers/proximal_bundle_method.jl b/src/solvers/proximal_bundle_method.jl index f71cb28bce..583705ce6c 100644 --- a/src/solvers/proximal_bundle_method.jl +++ b/src/solvers/proximal_bundle_method.jl @@ -15,27 +15,27 @@ stores option values for a [`proximal_bundle_method`](@ref) solver. * `δ`: parameter for updating `μ`: if ``δ < 0`` then ``μ = \\log(i + 1)``, else ``μ += δ μ`` * `ε`: stepsize-like parameter related to the injectivity radius of the manifold * `η`: curvature-dependent term for updating the approximation errors -* $(_field_inv_retr) +$(_var(:Field, :inverse_retraction_method)) * `λ`: convex coefficients that solve the subproblem * `m`: the parameter to test the decrease of the cost * `μ`: (initial) proximal parameter for the subproblem * `ν`: the stopping parameter given by ``ν = - μ |d|^2 - c`` -* `p`: current candidate point +$(_var(:Field, :p; add=[:as_Iterate])) * `p_last_serious`: last serious iterate -* $(_field_retr) -* $(_field_stop) +$(_var(:Field, :retraction_method)) +$(_var(:Field, :stopping_criterion, "stop")) * `transported_subgradients`: subgradients of the bundle that are transported to `p_last_serious` -* $(_field_vector_transp) -* $(_field_subgradient) -* $(_field_sub_problem) -* $(_field_sub_state) +$(_var(:Field, :vector_transport_method)) +$(_var(:Field, :X; add=[:as_Subgradient])) +$(_var(:Field, :sub_problem)) +$(_var(:Field, :sub_state)) # Constructor - ProximalBundleMethodState(M::AbstractManifold, p=rand(M); kwargs...) + ProximalBundleMethodState(M::AbstractManifold, sub_problem, sub_state; kwargs...) + ProximalBundleMethodState(M::AbstractManifold, sub_problem=proximal_bundle_method_subsolver; evaluation=AllocatingEvaluation(), kwargs...) Generate the state for the [`proximal_bundle_method`](@ref) on the manifold `M` -with initial point `p`. # Keyword arguments @@ -45,13 +45,14 @@ with initial point `p`. * `ε=1e-2` * `μ=0.5` * `m=0.0125` -* $(_kw_inverse_retraction_method_default): $(_kw_inverse_retraction_method) -* $(_kw_retraction_method_default): $(_kw_retraction_method) -* `stopping_criterion=`[`StopWhenLagrangeMultiplierLess`](@ref)`(1e-8)`$(_sc_any)[`StopAfterIteration`](@ref)`(5000)` -* `sub_problem=`[`proximal_bundle_method_subsolver`](@ref) -* `sub_state=`[`AllocatingEvaluation`](@ref) -* $(_kw_vector_transport_method_default): $(_kw_vector_transport_method) -* `X=`$(_link_zero_vector()) specify the type of tangent vector to use. +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :inverse_retraction_method)) +$(_var(:Keyword, :p; add=:as_Initial)) +$(_var(:Keyword, :stopping_criterion; default="[`StopWhenLagrangeMultiplierLess`](@ref)`(1e-8)`$(_sc(:Any))[`StopAfterIteration`](@ref)`(5000)`")) +$(_var(:Keyword, :sub_problem; default="[`proximal_bundle_method_subsolver`](@ref)`")) +$(_var(:Keyword, :sub_state; default="[`AllocatingEvaluation`](@ref)")) +$(_var(:Keyword, :vector_transport_method)) +* `X=`$(_link(:zero_vector)) specify the type of tangent vector to use. """ mutable struct ProximalBundleMethodState{ P, @@ -91,7 +92,9 @@ mutable struct ProximalBundleMethodState{ sub_state::St function ProximalBundleMethodState( M::TM, - p::P; + sub_problem::Pr, + sub_state::St; + p::P=rand(M), m::R=0.0125, inverse_retraction_method::IR=default_inverse_retraction_method(M, typeof(p)), retraction_method::TR=default_retraction_method(M, typeof(p)), @@ -104,12 +107,11 @@ mutable struct ProximalBundleMethodState{ ε::R=1e-2, δ::R=1.0, μ::R=0.5, - sub_problem::Pr=proximal_bundle_method_subsolver, - sub_state::Union{AbstractEvaluationType,AbstractManoptSolverState}=AllocatingEvaluation(), ) where { P, T, - Pr, + Pr<:Union{AbstractManoptProblem,F} where {F}, + St<:AbstractManoptSolverState, R<:Real, IR<:AbstractInverseRetractionMethod, TM<:AbstractManifold, @@ -125,12 +127,11 @@ mutable struct ProximalBundleMethodState{ d = copy(M, p, X) lin_errors = [zero(R)] transported_subgradients = [copy(M, p, X)] - sub_state_storage = maybe_wrap_evaluation_type(sub_state) α = zero(R) λ = [zero(R)] η = zero(R) ν = zero(R) - return new{P,T,Pr,typeof(sub_state_storage),R,IR,TR,SC,VT}( + return new{P,T,Pr,St,R,IR,TR,SC,VT}( approx_errors, bundle, c, @@ -155,10 +156,20 @@ mutable struct ProximalBundleMethodState{ μ, ν, sub_problem, - sub_state_storage, + sub_state, ) end end +function ProximalBundleMethodState( + M::AbstractManifold, + sub_problem=proximal_bundle_method_subsolver; + evaluation::E=AllocatingEvaluation(), + kwargs..., +) where {E<:AbstractEvaluationType} + cfs = ClosedFormSubSolverState(; evaluation=evaluation) + return ProximalBundleMethodState(M, sub_problem, cfs; kwargs...) +end + get_iterate(pbms::ProximalBundleMethodState) = pbms.p_last_serious function set_iterate!(pbms::ProximalBundleMethodState, M, p) copyto!(M, pbms.p_last_serious, p) @@ -205,11 +216,11 @@ with ``X_{q_j} ∈ ∂f(q_j)``, ``p_k`` the last serious iterate, sub solver, see for example the [`proximal_bundle_method_subsolver`](@ref). """ _doc_PBM = """ - proximal_bundle_method(M, f, ∂f, p, kwargs...) + proximal_bundle_method(M, f, ∂f, p=rand(M), kwargs...) proximal_bundle_method!(M, f, ∂f, p, kwargs...) -perform a proximal bundle method ``p^{(k+1)} = $(_l_retr)_{p^{(k)}}(-d_k)``, -where ``$(_l_retr)`` is a retraction and +perform a proximal bundle method ``p^{(k+1)} = $(_tex(:retr))_{p^{(k)}}(-d_k)``, +where ``$(_tex(:retr))`` is a retraction and $(_doc_PBM_dk) @@ -220,10 +231,10 @@ For more details see [HoseiniMonjeziNobakhtianPouryayevali:2021](@cite). # Input -* $(_arg_M) -* $(_arg_f) -* $(_arg_subgrad_f) -* $(_arg_p) +$(_var(:Argument, :M; type=true)) +$(_var(:Argument, :f)) +$(_var(:Argument, :subgrad_f, _var(:subgrad_f, :symbol))) +$(_var(:Argument, :p)) # Keyword arguments @@ -231,25 +242,24 @@ For more details see [HoseiniMonjeziNobakhtianPouryayevali:2021](@cite). * `bundle_size=50`: the maximal size of the bundle * `δ=1.0`: parameter for updating `μ`: if ``δ < 0`` then ``μ = \\log(i + 1)``, else ``μ += δ μ`` * `ε=1e-2`: stepsize-like parameter related to the injectivity radius of the manifold -* $(_kw_evaluation_default): $(_kw_evaluation) -* $(_kw_inverse_retraction_method_default): $(_kw_inverse_retraction_method) +$(_var(:Keyword, :evaluation)) +$(_var(:Keyword, :inverse_retraction_method)) * `m=0.0125`: a real number that controls the decrease of the cost function * `μ=0.5`: initial proximal parameter for the subproblem -* $(_kw_retraction_method_default): $(_kw_retraction_method) -* `stopping_criterion=`[`StopWhenLagrangeMultiplierLess`](@ref)`(1e-8)`$(_sc_any)[`StopAfterIteration`](@ref)`(5000)`: - $(_kw_stopping_criterion) -* `sub_problem=`[`proximal_bundle_method_subsolver`](@ref) -* `sub_state=`[`AllocatingEvaluation`](@ref) -* $(_kw_vector_transport_method_default): $(_kw_vector_transport_method) +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :stopping_criterion; default="[`StopWhenLagrangeMultiplierLess`](@ref)`(1e-8)`$(_sc(:Any))[`StopAfterIteration`](@ref)`(5000)`")) +$(_var(:Keyword, :sub_problem; default="[`proximal_bundle_method_subsolver`](@ref)`")) +$(_var(:Keyword, :sub_state; default="[`AllocatingEvaluation`](@ref)")) +$(_var(:Keyword, :vector_transport_method)) -$(_kw_others) +$(_note(:OtherKeywords)) -$(_doc_sec_output) +$(_note(:OutputSection)) """ @doc "$(_doc_PBM)" function proximal_bundle_method( - M::AbstractManifold, f::TF, ∂f::TdF, p; kwargs... + M::AbstractManifold, f::TF, ∂f::TdF, p=rand(M); kwargs... ) where {TF,TdF} p_star = copy(M, p) return proximal_bundle_method!(M, f, ∂f, p_star; kwargs...) @@ -284,7 +294,9 @@ function proximal_bundle_method!( sub_state_storage = maybe_wrap_evaluation_type(sub_state) pbms = ProximalBundleMethodState( M, - p; + sub_problem, + maybe_wrap_evaluation_type(sub_state); + p=p, m=m, inverse_retraction_method=inverse_retraction_method, retraction_method=retraction_method, @@ -295,8 +307,6 @@ function proximal_bundle_method!( ε=ε, δ=δ, μ=μ, - sub_problem=sub_problem, - sub_state=sub_state_storage, ) pbms = decorate_state!(pbms; kwargs...) return get_solver_return(solve!(mp, pbms)) diff --git a/src/solvers/quasi_Newton.jl b/src/solvers/quasi_Newton.jl index bae0128676..10240416b8 100644 --- a/src/solvers/quasi_Newton.jl +++ b/src/solvers/quasi_Newton.jl @@ -10,14 +10,14 @@ all necessary fields. * `η`: the current update direction * `nondescent_direction_behavior`: a `Symbol` to specify how to handle direction that are not descent ones. * `nondescent_direction_value`: the value from the last inner product from checking for descent directions -* $(_field_p) +$(_var(:Field, :p; add=[:as_Iterate])) * `p_old`: the last iterate * `sk`: the current step * `yk`: the current gradient difference -* $(_field_retr) -* $(_field_step) -* $(_field_stop) -* $(_field_gradient) +$(_var(:Field, :retraction_method)) +$(_var(:Field, :stepsize)) +$(_var(:Field, :stopping_criterion, "stop")) +$(_var(:Field, :X; add=[:as_Gradient])) * `X_old`: the last gradient @@ -30,12 +30,11 @@ Generate the Quasi Newton state on the manifold `M` with start point `p`. ## Keyword arguments * `direction_update=`[`QuasiNewtonLimitedMemoryDirectionUpdate`](@ref)`(M, p, InverseBFGS(), 20; vector_transport_method=vector_transport_method)` -* `stopping_criterion=`[`StopAfterIteration`9(@ref)`(1000)`$(_sc_any)[`StopWhenGradientNormLess`](@ref)`(1e-6)` -* $(_kw_retraction_method_default): $(_kw_retraction_method) -* `stepsize=default_stepsize(M; QuasiNewtonState)`: $(_kw_stepsize) - The default here is the [`WolfePowellLinesearch`](@ref) using the keywords `retraction_method` and `vector_transport_method` -* $(_kw_vector_transport_method_default): $(_kw_vector_transport_method) -* $(_kw_X_default) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`9(@ref)`(1000)`$(_sc(:Any))[`StopWhenGradientNormLess`](@ref)`(1e-6)`")) +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :stepsize; default="[`default_stepsize`](@ref)`(M, QuasiNewtonState)`")) +$(_var(:Keyword, :vector_transport_method)) +$(_var(:Keyword, :X; add=:as_Memory)) # See also @@ -67,8 +66,8 @@ mutable struct QuasiNewtonState{ nondescent_direction_value::R end function QuasiNewtonState( - M::AbstractManifold, - p::P; + M::AbstractManifold; + p::P=rand(M), initial_vector::T=zero_vector(M, p), # deprecated X::T=initial_vector, vector_transport_method::VTM=default_vector_transport_method(M, typeof(p)), @@ -161,43 +160,33 @@ function set_gradient!(qns::QuasiNewtonState, M, p, X) copyto!(M, qns.X, p, X) return qns end -function default_stepsize( - M::AbstractManifold, - ::Type{QuasiNewtonState}; - vector_transport_method=default_vector_transport_method(M), - retraction_method=default_retraction_method(M), -) - return WolfePowellLinesearch( - M; - retraction_method=retraction_method, - vector_transport_method=vector_transport_method, - linesearch_stopsize=1e-10, - ) +function default_stepsize(M::AbstractManifold, ::Type{QuasiNewtonState}; kwargs...) + return Manopt.WolfePowellLinesearchStepsize(M; stop_when_stepsize_less=1e-10, kwargs...) end -_doc_QN_init_scaling = raw"``\frac{⟨s_k,y_k⟩_{p_k}}{\lVert y_k\rVert_{p_k}}``" +_doc_QN_init_scaling = raw"``\frac{s⟨s_k,y_k⟩_{p_k}}{\lVert y_k\rVert_{p_k}}``" _doc_QN = """ quasi_Newton(M, f, grad_f, p; kwargs...) quasi_Newton!(M, f, grad_f, p; kwargs...) Perform a quasi Newton iteration to solve -$(_problem_default) +$(_problem(:Default)) with start point `p`. The iterations can be done in-place of `p```=p^{(0)}``. The ``k``th iteration consists of -1. Compute the search direction ``η^{(k)} = -$(_l_cal("B"))_k [$(_l_grad)f (p^{(k)})]`` or solve ``$(_l_cal("H"))_k [η^{(k)}] = -$(_l_grad)f (p^{(k)})]``. +1. Compute the search direction ``η^{(k)} = -$(_tex(:Cal, "B"))_k [$(_tex(:grad))f (p^{(k)})]`` or solve ``$(_tex(:Cal, "H"))_k [η^{(k)}] = -$(_tex(:grad))f (p^{(k)})]``. 2. Determine a suitable stepsize ``α_k`` along the curve ``γ(α) = R_{p^{(k)}}(α η^{(k)})``, usually by using [`WolfePowellLinesearch`](@ref). 3. Compute ``p^{(k+1)} = R_{p^{(k)}}(α_k η^{(k)})``. -4. Define ``s_k = $(_l_cal("T"))_{p^{(k)}, α_k η^{(k)}}(α_k η^{(k)})`` and ``y_k = $(_l_grad)f(p^{(k+1)}) - $(_l_cal("T"))_{p^{(k)}, α_k η^{(k)}}($(_l_grad)f(p^{(k)}))``, where ``$(_l_cal("T"))`` denotes a vector transport. +4. Define ``s_k = $(_tex(:Cal, "T"))_{p^{(k)}, α_k η^{(k)}}(α_k η^{(k)})`` and ``y_k = $(_tex(:grad))f(p^{(k+1)}) - $(_tex(:Cal, "T"))_{p^{(k)}, α_k η^{(k)}}($(_tex(:grad))f(p^{(k)}))``, where ``$(_tex(:Cal, "T"))`` denotes a vector transport. 5. Compute the new approximate Hessian ``H_{k+1}`` or its inverse ``B_{k+1}``. # Input -$(_arg_M) -$(_arg_f) -$(_arg_grad_f) -$(_arg_p) +$(_var(:Argument, :M; type=true)) +$(_var(:Argument, :f)) +$(_var(:Argument, :grad_f)) +$(_var(:Argument, :p)) # Keyword arguments @@ -212,13 +201,13 @@ $(_arg_p) and strictly increasing at ``0`` * `direction_update=`[`InverseBFGS`](@ref)`()`: the [`AbstractQuasiNewtonUpdateRule`](@ref) to use. -* $(_kw_evaluation_default): - $(_kw_evaluation) - $(_kw_evaluation_example) -* `initial_operator=Matrix{Float64}(I, n, n)`: +$(_var(:Keyword, :evaluation; add=:GradientExample)) +* `initial_operator= initial_scale*Matrix{Float64}(I, n, n)`: initial matrix to use in case the Hessian (inverse) approximation is stored as a full matrix, that is `n=manifold_dimension(M)`. This matrix is only allocated for the full matrix case. - See also `scale_initial_operator`. + See also `initial_scale`. +* `initial_scale=1.0`: scale initial `s` to use in with $(_doc_QN_init_scaling) in the computation of the limited memory approach. + see also `initial_operator` * `memory_size=20`: limited memory, number of ``s_k, y_k`` to store. Set to a negative value to use a full memory (matrix) representation * `nondescent_direction_behavior=:reinitialize_direction_update`: @@ -228,19 +217,16 @@ $(_arg_p) * `:reinitialize_direction_update`: discards operator state stored in direction update rules. * any other value performs the verification, keeps the direction but stores a message. A stored message can be displayed using [`DebugMessages`](@ref). -* $(_kw_retraction_method_default): $(_kw_retraction_method) -* `scale_initial_operator=true`: scale initial operator with $(_doc_QN_init_scaling) in the computation -* `stabilize=true`: stabilize the method numerically by projecting computed (Newton-) - directions to the tangent space to reduce numerical errors -* `stepsize=`[`WolfePowellLinesearch`](@ref)`(retraction_method, vector_transport_method)`: - $(_kw_stepsize) -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(max(1000, memory_size))`$(_sc_any)[`StopWhenGradientNormLess`](@ref)`(1e-6)`: - $(_kw_stopping_criterion) -* $(_kw_vector_transport_method_default): $(_kw_vector_transport_method) - -$(_kw_others) - -$(_doc_sec_output) +* `project!=copyto!`: for numerical stability it is possible to project onto the tangent space after every iteration. + the function has to work inplace of `Y`, that is `(M, Y, p, X) -> Y`, where `X` and `Y` can be the same memory. +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :stepsize; default="[`WolfePowellLinesearch`](@ref)`(retraction_method, vector_transport_method)`")) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(max(1000, memory_size))`$(_sc(:Any))[`StopWhenGradientNormLess`](@ref)`(1e-6)`")) +$(_var(:Keyword, :vector_transport_method)) + +$(_note(:OtherKeywords)) + +$(_note(:OutputSection)) """ @doc "$(_doc_QN)" @@ -248,28 +234,16 @@ function quasi_Newton( M::AbstractManifold, f::TF, grad_f::TDF, - p; + p=rand(M); evaluation::AbstractEvaluationType=AllocatingEvaluation(), kwargs..., ) where {TF,TDF} - mgo = ManifoldGradientObjective(f, grad_f; evaluation=evaluation) - return quasi_Newton(M, mgo, p; kwargs...) -end -function quasi_Newton( - M::AbstractManifold, - f::TF, - grad_f::TDF, - p::Number; - evaluation::AbstractEvaluationType=AllocatingEvaluation(), - kwargs..., -) where {TF,TDF} - # redefine initial point - q = [p] - f_(M, p) = f(M, p[]) - grad_f_ = _to_mutating_gradient(grad_f, evaluation) - rs = quasi_Newton(M, f_, grad_f_, q; evaluation=AllocatingEvaluation(), kwargs...) - #return just a number if the return type is the same as the type of q - return (typeof(q) == typeof(rs)) ? rs[] : rs + p_ = _ensure_mutating_variable(p) + f_ = _ensure_mutating_cost(f, p) + grad_f_ = _ensure_mutating_gradient(grad_f, p, evaluation) + mgo = ManifoldGradientObjective(f_, grad_f_; evaluation=evaluation) + rs = quasi_Newton(M, mgo, p_; kwargs...) + return _ensure_matching_output(p, rs) end function quasi_Newton( M::AbstractManifold, mgo::O, p; kwargs... @@ -305,7 +279,7 @@ function quasi_Newton!( basis::AbstractBasis=DefaultOrthonormalBasis(), direction_update::AbstractQuasiNewtonUpdateRule=InverseBFGS(), memory_size::Int=min(manifold_dimension(M), 20), - stabilize::Bool=true, + (project!)=copyto!, initial_operator::AbstractMatrix=( if memory_size >= 0 fill(1.0, 0, 0) # don't allocate `initial_operator` for limited memory operation @@ -313,8 +287,8 @@ function quasi_Newton!( Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M)) end ), - scale_initial_operator::Bool=true, - stepsize::Stepsize=default_stepsize( + initial_scale::Real=1.0, + stepsize::Union{Stepsize,ManifoldDefaultsFactory}=default_stepsize( M, QuasiNewtonState; retraction_method=retraction_method, @@ -330,8 +304,8 @@ function quasi_Newton!( p, direction_update, memory_size; - scale=scale_initial_operator, - project=stabilize, + initial_scale=initial_scale, + (project!)=project!, vector_transport_method=vector_transport_method, ) else @@ -340,7 +314,7 @@ function quasi_Newton!( direction_update, basis, initial_operator; - scale=scale_initial_operator, + initial_scale=initial_scale, vector_transport_method=vector_transport_method, ) end @@ -352,12 +326,12 @@ function quasi_Newton!( dmgo = decorate_objective!(M, mgo; debug=debug, kwargs...) mp = DefaultManoptProblem(M, dmgo) qns = QuasiNewtonState( - M, - p; + M; + p=p, initial_vector=get_gradient(mp, p), direction_update=local_dir_upd, stopping_criterion=stopping_criterion, - stepsize=stepsize, + stepsize=_produce_type(stepsize, M), retraction_method=retraction_method, vector_transport_method=vector_transport_method, ) @@ -451,8 +425,8 @@ function update_hessian!( yk_c = get_coordinates(M, p, st.yk, d.basis) sk_c = get_coordinates(M, p, st.sk, d.basis) skyk_c = inner(M, p, st.sk, st.yk) - if iter == 1 && d.scale == true - d.matrix = skyk_c / inner(M, p, st.yk, st.yk) * d.matrix + if iter == 1 + d.matrix = d.initial_scale * skyk_c / inner(M, p, st.yk, st.yk) * d.matrix end d.matrix = (I - sk_c * yk_c' / skyk_c) * d.matrix * (I - yk_c * sk_c' / skyk_c) + @@ -470,8 +444,8 @@ function update_hessian!( yk_c = get_coordinates(M, p, st.yk, d.basis) sk_c = get_coordinates(M, p, st.sk, d.basis) skyk_c = inner(M, p, st.sk, st.yk) - if iter == 1 && d.scale == true - d.matrix = inner(M, p, st.yk, st.yk) / skyk_c * d.matrix + if iter == 1 + d.matrix = d.initial_scale * inner(M, p, st.yk, st.yk) / skyk_c * d.matrix end d.matrix = d.matrix + yk_c * yk_c' / skyk_c - @@ -493,8 +467,8 @@ function update_hessian!( yk_c = get_coordinates(M, p, st.yk, d.basis) sk_c = get_coordinates(M, p, st.sk, d.basis) skyk_c = inner(M, p, st.sk, st.yk) - if iter == 1 && d.scale == true - d.matrix = inner(M, p, st.sk, st.sk) / skyk_c * d.matrix + if iter == 1 + d.matrix = d.initial_scale * inner(M, p, st.sk, st.sk) / skyk_c * d.matrix end d.matrix = d.matrix + sk_c * sk_c' / skyk_c - @@ -516,8 +490,8 @@ function update_hessian!( yk_c = get_coordinates(M, p, st.yk, d.basis) sk_c = get_coordinates(M, p, st.sk, d.basis) skyk_c = inner(M, p, st.sk, st.yk) - if iter == 1 && d.scale == true - d.matrix = skyk_c / inner(M, p, st.sk, st.sk) * d.matrix + if iter == 1 + d.matrix = d.initial_scale * skyk_c / inner(M, p, st.sk, st.sk) * d.matrix end d.matrix = (I - yk_c * sk_c' / skyk_c) * d.matrix * (I - sk_c * yk_c' / skyk_c) + diff --git a/src/solvers/stochastic_gradient_descent.jl b/src/solvers/stochastic_gradient_descent.jl index 53137d1f35..d647f83cde 100644 --- a/src/solvers/stochastic_gradient_descent.jl +++ b/src/solvers/stochastic_gradient_descent.jl @@ -6,30 +6,32 @@ see also [`ManifoldStochasticGradientObjective`](@ref) and [`stochastic_gradient # Fields -* $(_field_iterate) +$(_var(:Field, :p; add=[:as_Iterate])) * `direction`: a direction update to use -* $(_field_stop) -* $(_field_step) +$(_var(:Field, :stopping_criterion, "stop")) +$(_var(:Field, :stepsize)) * `evaluation_order`: specify whether to use a randomly permuted sequence (`:FixedRandom`:), a per cycle permuted sequence (`:Linear`) or the default, a `:Random` sequence. * `order`: stores the current permutation -* $(_field_retr) +$(_var(:Field, :retraction_method)) # Constructor - StochasticGradientDescentState(M, p, X; kwargs...) + StochasticGradientDescentState(M::AbstractManifold; kwargs...) Create a `StochasticGradientDescentState` with start point `p`. # Keyword arguments -* `direction=`[`StochasticGradient`](@ref)`($(_link_zero_vector())) +* `direction=`[`StochasticGradientRule`](@ref)`(M, $(_link(:zero_vector))) * `order_type=:RandomOrder`` * `order=Int[]`: specify how to store the order of indices for the next epoche -* $(_kw_retraction_method_default): $(_kw_retraction_method) -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(1000)`: $(_kw_stopping_criterion) -* `stepsize=`[`default_stepsize`[@ref)`(M, StochasticGradientDescentState)`: $(_kw_stepsize) - This default is the [`ConstantStepsize`](@ref)`(M)` +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :p; add=:as_Initial)) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(1000)`")) +$(_var(:Keyword, :stepsize; default="[`default_stepsize`](@ref)`(M, StochasticGradientDescentState)`")) +$(_var(:Keyword, :X; add=:as_Memory)) + """ mutable struct StochasticGradientDescentState{ TX, @@ -51,10 +53,10 @@ mutable struct StochasticGradientDescentState{ end function StochasticGradientDescentState( - M::AbstractManifold, - p::P, - X::Q; - direction::D=StochasticGradient(zero_vector(M, p)), + M::AbstractManifold; + p::P=rand(M), + X::T=zero_vector(M, p), + direction::D=StochasticGradientRule(M; X=copy(M, p, X)), order_type::Symbol=:RandomOrder, order::Vector{<:Int}=Int[], retraction_method::RM=default_retraction_method(M, typeof(p)), @@ -62,13 +64,13 @@ function StochasticGradientDescentState( stepsize::S=default_stepsize(M, StochasticGradientDescentState), ) where { P, - Q, + T, D<:DirectionUpdateRule, RM<:AbstractRetractionMethod, SC<:StoppingCriterion, S<:Stepsize, } - return StochasticGradientDescentState{P,Q,D,SC,S,RM}( + return StochasticGradientDescentState{P,T,D,SC,S,RM}( p, X, direction, @@ -101,38 +103,68 @@ function show(io::IO, sgds::StochasticGradientDescentState) return print(io, s) end """ - StochasticGradient <: AbstractGradientGroupProcessor + StochasticGradientRule<: AbstractGradientGroupDirectionRule + +Create a functor `(problem, state k) -> (s,X)` to evaluate the stochatsic gradient, +that is chose a random index from the `state` and use the internal field for +evaluation of the gradient in-place. The default gradient processor, which just evaluates the (stochastic) gradient or a subset thereof. # Fields -* `dir::T`: a storage for a tangent vector. +$(_var(:Field, :X)) # Constructor - StochasticGradient(M::AbstractManifold; p=rand(M), X=zero_vector(M, p)) + StochasticGradientRule(M::AbstractManifold; p=rand(M), X=zero_vector(M, p)) -Initialize the stochastic Gradient processor with tangent vector type of `X`, +Initialize the stochastic gradient processor with tangent vector type of `X`, where both `M` and `p` are just help variables. + +# See also +[`stochastic_gradient_descent`](@ref), [`StochasticGradient`])@ref) """ -struct StochasticGradient{T} <: AbstractGradientGroupProcessor - dir::T +struct StochasticGradientRule{T} <: AbstractGradientGroupDirectionRule + X::T end -function StochasticGradient(M::AbstractManifold; p=rand(M), X=zero_vector(M, p)) - return StochasticGradient{typeof(X)}(X) +function StochasticGradientRule( + M::AbstractManifold; p=rand(M), X::T=zero_vector(M, p) +) where {T} + return StochasticGradientRule{T}(X) end -function (sg::StochasticGradient)( - apm::AbstractManoptProblem, sgds::StochasticGradientDescentState, iter +function (sg::StochasticGradientRule)( + apm::AbstractManoptProblem, sgds::StochasticGradientDescentState, k ) # for each new epoch choose new order if at random order ((sgds.k == 1) && (sgds.order_type == :Random)) && shuffle!(sgds.order) # the gradient to choose, either from the order or completely random j = sgds.order_type == :Random ? rand(1:length(sgds.order)) : sgds.order[sgds.k] - return sgds.stepsize(apm, sgds, iter), get_gradient!(apm, sg.dir, sgds.p, j) + return sgds.stepsize(apm, sgds, k), get_gradient!(apm, sg.X, sgds.p, j) +end + +@doc """ + StochasticGradient(; kwargs...) + StochasticGradient(M::AbstractManifold; kwargs...) + +# Keyword arguments + +$(_var(:Keyword, :X, "initial_gradient")) +$(_var(:Keyword, :p; add=:as_Initial)) + +$(_note(:ManifoldDefaultFactory, "StochasticGradientRule")) +""" +function StochasticGradient(args...; kwargs...) + return ManifoldDefaultsFactory(Manopt.StochasticGradientRule, args...; kwargs...) end +""" + default_stepsize(M::AbstractManifold, ::Type{StochasticGradientDescentState}) + +Deinfe the default step size computed for the [`StochasticGradientDescentState`](@ref), +which is [`ConstantStepsize`](@ref)`M`. +""" function default_stepsize(M::AbstractManifold, ::Type{StochasticGradientDescentState}) return ConstantStepsize(M) end @@ -147,10 +179,10 @@ perform a stochastic gradient descent. This can be perfomed in-place of `p`. # Input -$(_arg_M) +$(_var(:Argument, :M; type=true)) * `grad_f`: a gradient function, that either returns a vector of the gradients or is a vector of gradient functions -$(_arg_p) +$(_var(:Argument, :p)) alternatively to the gradient you can provide an [`ManifoldStochasticGradientObjective`](@ref) `msgo`, then using the `cost=` keyword does not have any effect since if so, the cost is already within the objective. @@ -158,20 +190,20 @@ then using the `cost=` keyword does not have any effect since if so, the cost is # Keyword arguments * `cost=missing`: you can provide a cost function for example to track the function value -* `direction=`[`StochasticGradient`](@ref)`($(_link_zero_vector())) -* $(_kw_evaluation_default): $(_kw_evaluation) +* `direction=`[`StochasticGradient`](@ref)`($(_link(:zero_vector))) +$(_var(:Keyword, :evaluation)) * `evaluation_order=:Random`: specify whether to use a randomly permuted sequence (`:FixedRandom`:, a per cycle permuted sequence (`:Linear`) or the default `:Random` one. * `order_type=:RandomOder`: a type of ordering of gradient evaluations. Possible values are `:RandomOrder`, a `:FixedPermutation`, `:LinearOrder` -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(1000)`: $(_kw_stopping_criterion) -* `stepsize=`[`default_stepsize`[@ref)`(M, StochasticGradientDescentState)`: $(_kw_stepsize) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(1000)`")) +$(_var(:Keyword, :stepsize; default="[`default_stepsize`](@ref)`(M, StochasticGradientDescentState)`")) * `order=[1:n]`: the initial permutation, where `n` is the number of gradients in `gradF`. -* $(_kw_retraction_method_default): $(_kw_retraction_method) +$(_var(:Keyword, :retraction_method)) -$(_kw_others) +$(_note(:OtherKeywords)) -$(_doc_sec_output) +$(_note(:OutputSection)) """ @doc "$(_doc_SGD)" @@ -187,34 +219,21 @@ function stochastic_gradient_descent( evaluation::AbstractEvaluationType=AllocatingEvaluation(), kwargs..., ) - msgo = ManifoldStochasticGradientObjective(grad_f; cost=cost, evaluation=evaluation) - return stochastic_gradient_descent(M, msgo, p; evaluation=evaluation, kwargs...) -end -function stochastic_gradient_descent( - M::AbstractManifold, - grad_f, - p::Number; - cost=Missing(), - evaluation::AbstractEvaluationType=AllocatingEvaluation(), - kwargs..., -) - q = [p] - f_ = ismissing(cost) ? cost : (M, p) -> cost(M, p[]) - if grad_f isa Function - n = grad_f(M, p) isa Number - grad_f_ = (M, p) -> [[X] for X in (n ? [grad_f(M, p[])] : grad_f(M, p[]))] - else - if evaluation isa AllocatingEvaluation - grad_f_ = [(M, p) -> [f(M, p[])] for f in grad_f] + p_ = _ensure_mutating_variable(p) + cost_ = _ensure_mutating_cost(cost, p) + if p isa Number + if grad_f isa Function + n = grad_f(M, p) isa Number + grad_f_ = (M, p) -> [[X] for X in (n ? [grad_f(M, p[])] : grad_f(M, p[]))] else - grad_f_ = [(M, X, p) -> (X .= [f(M, p[])]) for f in grad_f] + grad_f_ = [_ensure_mutating_gradient(f, p, evaluation) for f in grad_f] end + else + grad_f_ = grad_f end - rs = stochastic_gradient_descent( - M, grad_f_, q; cost=f_, evaluation=evaluation, kwargs... - ) - #return just a number if the return type is the same as the type of q - return (typeof(q) == typeof(rs)) ? rs[] : rs + msgo = ManifoldStochasticGradientObjective(grad_f_; cost=cost_, evaluation=evaluation) + rs = stochastic_gradient_descent(M, msgo, p_; evaluation=evaluation, kwargs...) + return _ensure_matching_output(p, rs) end function stochastic_gradient_descent( M::AbstractManifold, msgo::O, p; kwargs... @@ -240,10 +259,14 @@ function stochastic_gradient_descent!( M::AbstractManifold, msgo::O, p; - direction::DirectionUpdateRule=StochasticGradient(zero_vector(M, p)), + direction::Union{<:DirectionUpdateRule,ManifoldDefaultsFactory}=StochasticGradient(; + p=p + ), stopping_criterion::StoppingCriterion=StopAfterIteration(10000) | StopWhenGradientNormLess(1e-9), - stepsize::Stepsize=default_stepsize(M, StochasticGradientDescentState), + stepsize::Union{Stepsize,ManifoldDefaultsFactory}=default_stepsize( + M, StochasticGradientDescentState + ), order=collect(1:length(get_gradients(M, msgo, p))), order_type::Symbol=:Random, retraction_method::AbstractRetractionMethod=default_retraction_method(M, typeof(p)), @@ -252,12 +275,12 @@ function stochastic_gradient_descent!( dmsgo = decorate_objective!(M, msgo; kwargs...) mp = DefaultManoptProblem(M, dmsgo) sgds = StochasticGradientDescentState( - M, - p, - zero_vector(M, p); - direction=direction, + M; + p=p, + X=zero_vector(M, p), + direction=_produce_type(direction, M), stopping_criterion=stopping_criterion, - stepsize=stepsize, + stepsize=_produce_type(stepsize, M), order_type=order_type, order=order, retraction_method=retraction_method, diff --git a/src/solvers/subgradient.jl b/src/solvers/subgradient.jl index 6424c5f1b3..a86cf047b4 100644 --- a/src/solvers/subgradient.jl +++ b/src/solvers/subgradient.jl @@ -5,26 +5,26 @@ stores option values for a [`subgradient_method`](@ref) solver # Fields -* $(_field_p) +$(_var(:Field, :p; add=[:as_Iterate])) * `p_star`: optimal value -* $(_field_retr) -* $(_field_step) -* $(_field_stop) +$(_var(:Field, :retraction_method)) +$(_var(:Field, :stepsize)) +$(_var(:Field, :stopping_criterion, "stop")) * `X`: the current element from the possible subgradients at `p` that was last evaluated. # Constructor - SubGradientMethodState(M::AbstractManifold, p; kwargs...) + SubGradientMethodState(M::AbstractManifold; kwargs...) -Initialise the Subgradient method state to initial point `p` +Initialise the Subgradient method state # Keyword arguments -* $(_kw_retraction_method_default): $(_kw_retraction_method) -* `stepsize=`[`default_stepsize`](@ref)`(M, SubGradientMethodState)`, - which here defaults to [`ConstantStepsize`](@ref)`(M)` -* `stopping_criterion=[`StopAfterIteration`](@ref)`(5000)`: $(_kw_stopping_criterion) -* $(_kw_X_default): $(_kw_X) +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :p; add=:as_Initial)) +$(_var(:Keyword, :stepsize; default="[`default_stepsize`](@ref)`(M, SubGradientMethodState)`")) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(5000)`")) +$(_var(:Keyword, :X; add=:as_Memory)) """ mutable struct SubGradientMethodState{ TR<:AbstractRetractionMethod,TS<:Stepsize,TSC<:StoppingCriterion,P,T @@ -36,8 +36,8 @@ mutable struct SubGradientMethodState{ stop::TSC X::T function SubGradientMethodState( - M::TM, - p::P; + M::TM; + p::P=rand(M), stopping_criterion::SC=StopAfterIteration(5000), stepsize::S=default_stepsize(M, SubGradientMethodState), X::T=zero_vector(M, p), @@ -90,8 +90,8 @@ _doc_SGM = """ subgradient_method!(M, f, ∂f, p; kwargs...) subgradient_method!(M, sgo, p; kwargs...) -perform a subgradient method ``p^{(k+1)} = $(_l_retr)\\bigl(p^{(k)}, s^{(k)}∂f(p^{(k)})\\bigr)``, -where ``$(_l_retr)`` is a retraction, ``s^{(k)}`` is a step size. +perform a subgradient method ``p^{(k+1)} = $(_tex(:retr))\\bigl(p^{(k)}, s^{(k)}∂f(p^{(k)})\\bigr)``, +where ``$(_tex(:retr))`` is a retraction, ``s^{(k)}`` is a step size. Though the subgradient might be set valued, the argument `∂f` should always return _one_ element from the subgradient, but @@ -100,21 +100,20 @@ For more details see [FerreiraOliveira:1998](@cite). # Input -$(_arg_M) -$(_arg_f) -* `∂f`: the (sub)gradient ``∂ f: $(_l_M) → T$(_l_M)`` of f -$(_arg_p) +$(_var(:Argument, :M; type=true)) +$(_var(:Argument, :f)) +* `∂f`: the (sub)gradient ``∂ f: $(_math(:M)) → T$(_math(:M))`` of f +$(_var(:Argument, :p)) alternatively to `f` and `∂f` a [`ManifoldSubgradientObjective`](@ref) `sgo` can be provided. -# Optional +# Keyword arguments -* $(_kw_evaluation_default): $(_kw_evaluation) -* $(_kw_retraction_method_default): $(_kw_retraction_method) -* `stepsize=`[`default_stepsize`](@ref)`(M, SubGradientMethodState)`: $(_kw_stepsize) - which here defaults to [`ConstantStepsize`](@ref)`(M)` -* `stopping_criterion=[`StopAfterIteration`](@ref)`(5000)`: $(_kw_stopping_criterion) -* $(_kw_X_default): $(_kw_X) +$(_var(:Keyword, :evaluation)) +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :stepsize; default="[`default_stepsize`](@ref)`(M, SubGradientMethodState)`")) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(5000)`")) +$(_var(:Keyword, :X; add=:as_Memory)) and the ones that are passed to [`decorate_state!`](@ref) for decorators. @@ -136,23 +135,12 @@ function subgradient_method( evaluation::AbstractEvaluationType=AllocatingEvaluation(), kwargs..., ) - sgo = ManifoldSubgradientObjective(f, ∂f; evaluation=evaluation) - return subgradient_method(M, sgo, p; evaluation=evaluation, kwargs...) -end -function subgradient_method( - M::AbstractManifold, - f, - ∂f, - p::Number; - evaluation::AbstractEvaluationType=AllocatingEvaluation(), - kwargs..., -) - q = [p] - f_(M, p) = f(M, p[]) - ∂f_ = _to_mutating_gradient(∂f, evaluation) - rs = subgradient_method(M, f_, ∂f_, q; evaluation=evaluation, kwargs...) - #return just a number if the return type is the same as the type of q - return (typeof(q) == typeof(rs)) ? rs[] : rs + p_ = _ensure_mutating_variable(p) + f_ = _ensure_mutating_cost(f, p) + ∂f_ = _ensure_mutating_gradient(∂f, p, evaluation) + sgo = ManifoldSubgradientObjective(f_, ∂f_; evaluation=evaluation) + rs = subgradient_method(M, sgo, p_; evaluation=evaluation, kwargs...) + return _ensure_matching_output(p, rs) end function subgradient_method( M::AbstractManifold, sgo::O, p; kwargs... @@ -179,7 +167,9 @@ function subgradient_method!( sgo::O, p; retraction_method::AbstractRetractionMethod=default_retraction_method(M, typeof(p)), - stepsize::Stepsize=default_stepsize(M, SubGradientMethodState), + stepsize::Union{Stepsize,ManifoldDefaultsFactory}=default_stepsize( + M, SubGradientMethodState + ), stopping_criterion::StoppingCriterion=StopAfterIteration(5000), X=zero_vector(M, p), kwargs..., @@ -187,10 +177,10 @@ function subgradient_method!( dsgo = decorate_objective!(M, sgo; kwargs...) mp = DefaultManoptProblem(M, dsgo) sgs = SubGradientMethodState( - M, - p; + M; + p=p, stopping_criterion=stopping_criterion, - stepsize=stepsize, + stepsize=_produce_type(stepsize, M), retraction_method=retraction_method, X=X, ) diff --git a/src/solvers/truncated_conjugate_gradient_descent.jl b/src/solvers/truncated_conjugate_gradient_descent.jl index 324277e210..41a86a2ddb 100644 --- a/src/solvers/truncated_conjugate_gradient_descent.jl +++ b/src/solvers/truncated_conjugate_gradient_descent.jl @@ -15,10 +15,10 @@ Let `T` denote the type of a tangent vector and `R <: Real`. the function has to work inplace of `Y`, that is `(M, Y, p, X) -> Y`, where `X` and `Y` can be the same memory. * `randomize`: indicate whether `X` is initialised to a random vector or not * `residual::T`: the gradient of the model ``m(Y)`` -* $(_field_stop) +$(_var(:Field, :stopping_criterion, "stop")) * `θ::R`: the superlinear convergence target rate of ``1+θ`` * `trust_region_radius::R`: the trust-region radius -* `X::T`: the gradient ``$(_l_grad)f(p)`` +* `X::T`: the gradient ``$(_tex(:grad))f(p)`` * `Y::T`: current iterate tangent vector * `z::T`: the preconditioned residual * `z_r::R`: inner product of the residual and `z` @@ -32,7 +32,6 @@ Initialise the TCG state. ## Input * `TpM`: a [`TangentSpace`](@extref `ManifoldsBase.TangentSpace`) -* `Y`: an initial tangent vector ## Keyword arguments @@ -41,10 +40,11 @@ Initialise the TCG state. * `randomize=false` * `θ=1.0` * `trust_region_radius=`[`injectivity_radius`](@extref `ManifoldsBase.injectivity_radius-Tuple{AbstractManifold}`)`(base_manifold(TpM)) / 4` -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(`$(_link_manifold_dimension("base_manifold(Tpm)"))`)` - $(_sc_any)[`StopWhenResidualIsReducedByFactorOrPower`](@ref)`(; κ=κ, θ=θ)`$(_sc_any)[`StopWhenTrustRegionIsExceeded`](@ref)`()` - $(_sc_any)[`StopWhenCurvatureIsNegative`](@ref)`()`$(_sc_any)[`StopWhenModelIncreased`](@ref)`()`: - $(_kw_stopping_criterion) +$(_var( + :Keyword, + :stopping_criterion; + default="[`StopAfterIteration`](@ref)`(`$(_link(:manifold_dimension; M="base_manifold(Tpm)"))`)`$(_sc(:Any))[`StopWhenResidualIsReducedByFactorOrPower`](@ref)`(; κ=κ, θ=θ)`$(_sc(:Any))[`StopWhenTrustRegionIsExceeded`](@ref)`()`$(_sc(:Any))[`StopWhenCurvatureIsNegative`](@ref)`()`$(_sc(:Any))[`StopWhenModelIncreased`](@ref)`()`")) +$(_var(:Keyword, :X)) # See also @@ -71,8 +71,8 @@ mutable struct TruncatedConjugateGradientState{T,R<:Real,SC<:StoppingCriterion,P project!::Proj initialResidualNorm::Float64 function TruncatedConjugateGradientState( - TpM::TangentSpace, - Y::T=rand(TpM); + TpM::TangentSpace; + X::T=rand(TpM), trust_region_radius::R=injectivity_radius(base_manifold(TpM)) / 4.0, randomize::Bool=false, project!::F=copyto!, @@ -91,7 +91,7 @@ mutable struct TruncatedConjugateGradientState{T,R<:Real,SC<:StoppingCriterion,P ) where {T,R<:Real,F} tcgs = new{T,R,typeof(stopping_criterion),F}() tcgs.stop = stopping_criterion - tcgs.Y = Y + tcgs.Y = X tcgs.trust_region_radius = trust_region_radius tcgs.randomize = randomize tcgs.project! = project! @@ -116,19 +116,15 @@ function show(io::IO, tcgs::TruncatedConjugateGradientState) This indicates convergence: $Conv""" return print(io, s) end -function set_manopt_parameter!(tcgs::TruncatedConjugateGradientState, ::Val{:Iterate}, Y) +function set_parameter!(tcgs::TruncatedConjugateGradientState, ::Val{:Iterate}, Y) return tcgs.Y = Y end get_iterate(tcgs::TruncatedConjugateGradientState) = tcgs.Y -function set_manopt_parameter!( - tcgs::TruncatedConjugateGradientState, ::Val{:TrustRegionRadius}, r -) +function set_parameter!(tcgs::TruncatedConjugateGradientState, ::Val{:TrustRegionRadius}, r) return tcgs.trust_region_radius = r end -function get_manopt_parameter( - tcgs::TruncatedConjugateGradientState, ::Val{:TrustRegionExceeded} -) +function get_parameter(tcgs::TruncatedConjugateGradientState, ::Val{:TrustRegionExceeded}) return (tcgs.YPY >= tcgs.trust_region_radius^2) end @@ -142,13 +138,13 @@ A functor for testing if the norm of residual at the current iterate is reduced either by a power of 1+θ or by a factor κ compared to the norm of the initial residual. The criterion hence reads -``$(_l_norm("r_k","p")) ≦ $(_l_norm("r_0","p^{(0)}")) \\min \\bigl( κ, $(_l_norm("r_0","p^{(0)}")) \\bigr)``. +``$(_tex(:norm, "r_k"; index="p")) ≦ $(_tex(:norm, "r_0"; index="p^{(0)}")) $(_tex(:min)) $(_tex(:bigl))( κ, $(_tex(:norm, "r_0"; index="p^{(0)}")) $(_tex(:bigr)))``. # Fields * `κ`: the reduction factor * `θ`: part of the reduction power -* $(_field_at_iteration) +$(_var(:Field, :at_iteration)) # Constructor @@ -205,11 +201,11 @@ function show(io::IO, c::StopWhenResidualIsReducedByFactorOrPower) end @doc raw""" - update_stopping_criterion!(c::StopWhenResidualIsReducedByFactorOrPower, :ResidualPower, v) + set_parameter!(c::StopWhenResidualIsReducedByFactorOrPower, :ResidualPower, v) Update the residual Power `θ` to `v`. """ -function update_stopping_criterion!( +function set_parameter!( c::StopWhenResidualIsReducedByFactorOrPower, ::Val{:ResidualPower}, v ) c.θ = v @@ -217,11 +213,11 @@ function update_stopping_criterion!( end @doc raw""" - update_stopping_criterion!(c::StopWhenResidualIsReducedByFactorOrPower, :ResidualFactor, v) + set_parameter!(c::StopWhenResidualIsReducedByFactorOrPower, :ResidualFactor, v) Update the residual Factor `κ` to `v`. """ -function update_stopping_criterion!( +function set_parameter!( c::StopWhenResidualIsReducedByFactorOrPower, ::Val{:ResidualFactor}, v ) c.κ = v @@ -232,12 +228,12 @@ end StopWhenTrustRegionIsExceeded <: StoppingCriterion A functor for testing if the norm of the next iterate in the Steihaug-Toint truncated conjugate gradient -method is larger than the trust-region radius ``θ ≤ $(_l_norm("Y^{(k)}^{*}","p^{(k)}"))`` +method is larger than the trust-region radius ``θ ≤ $(_tex(:norm, "Y^{(k)}^{*}"; index="p^{(k)}"))`` and to end the algorithm when the trust region has been left. # Fields -* $(_field_at_iteration) +$(_var(:Field, :at_iteration)) * `trr` the trust region radius * `YPY` the computed norm of ``Y``. @@ -292,13 +288,13 @@ end StopWhenCurvatureIsNegative <: StoppingCriterion A functor for testing if the curvature of the model is negative, -``⟨δ_k, $(_l_Hess) F(p)[δ_k]⟩_p ≦ 0``. +``⟨δ_k, $(_tex(:Hess)) F(p)[δ_k]⟩_p ≦ 0``. In this case, the model is not strictly convex, and the stepsize as computed does not yield a reduction of the model. # Fields -* $(_field_at_iteration) +$(_var(:Field, :at_iteration)) * `value` store the value of the inner product. * `reason`: stores a reason of stopping if the stopping criterion has been reached, see [`get_reason`](@ref). @@ -352,7 +348,7 @@ A functor for testing if the curvature of the model value increased. # Fields -* $(_field_at_iteration) +$(_var(:Field, :at_iteration)) * `model_value`stre the last model value * `inc_model_value` store the model value that increased @@ -411,23 +407,21 @@ _doc_TCG_subproblem = raw""" ``` """ _doc_TCGD = """ - truncated_conjugate_gradient_descent(M, f, grad_f, p; kwargs...) - truncated_conjugate_gradient_descent(M, f, grad_f, p, X; kwargs...) - truncated_conjugate_gradient_descent(M, f, grad_f, Hess_f; kwargs...) - truncated_conjugate_gradient_descent(M, f, grad_f, Hess_f, p; kwargs...) - truncated_conjugate_gradient_descent(M, f, grad_f, Hess_f, p, X; kwargs...) - truncated_conjugate_gradient_descent(M, mho::ManifoldHessianObjective, p, X; kwargs...) - truncated_conjugate_gradient_descent(M, trmo::TrustRegionModelObjective, p, X; kwargs...) - truncated_conjugate_gradient_descent!(M, f, grad_f, Hess_f, p, X; kwargs...) - truncated_conjugate_gradient_descent!(M, f, grad_f, p, X; kwargs...) - truncated_conjugate_gradient_descent!(M, mho::ManifoldHessianObjective, p, X; kwargs...) - truncated_conjugate_gradient_descent!(M, trmo::TrustRegionModelObjective, p, X; kwargs...) + truncated_conjugate_gradient_descent(M, f, grad_f, Hess_f, p=rand(M), X=rand(M); vector_at=p); + kwargs... + ) + truncated_conjugate_gradient_descent(M, mho::ManifoldHessianObjective, p=rand(M), X=rand(M; vector_at=p); + kwargs... + ) + truncated_conjugate_gradient_descent(M, trmo::TrustRegionModelObjective, p=rand(M), X=rand(M; vector_at=p); + kwargs... + ) solve the trust-region subproblem $(_doc_TCG_subproblem) -on a manifold ``$(_l_M)`` by using the Steihaug-Toint truncated conjugate-gradient (tCG) method. +on a manifold ``$(_math(:M))`` by using the Steihaug-Toint truncated conjugate-gradient (tCG) method. This can be done inplace of `X`. For a description of the algorithm and theorems offering convergence guarantees, @@ -435,12 +429,12 @@ see [AbsilBakerGallivan:2006, ConnGouldToint:2000](@cite). # Input -$(_arg_M) -$(_arg_f) -$(_arg_grad_f) -$(_arg_Hess_f) -$(_arg_p) -$(_arg_X) +$(_var(:Argument, :M; type=true)) +$(_var(:Argument, :f)) +$(_var(:Argument, :grad_f)) +$(_var(:Argument, :Hess_f)) +$(_var(:Argument, :p)) +$(_var(:Argument, :X)) Instead of the three functions, you either provide a [`ManifoldHessianObjective`](@ref) `mho` which is then used to build the trust region model, or a [`TrustRegionModelObjective`](@ref) `trmo` @@ -448,7 +442,7 @@ directly. # Keyword arguments -* $(_kw_evaluation_default): $(_kw_evaluation) +$(_var(:Keyword, :evaluation)) * `preconditioner`: a preconditioner for the Hessian H. This is either an allocating function `(M, p, X) -> Y` or an in-place function `(M, Y, p, X) -> Y`, see `evaluation`, and by default set to the identity. @@ -456,18 +450,17 @@ directly. * `κ=0.1`: the linear convergence target rate. * `project!=copyto!`: for numerical stability it is possible to project onto the tangent space after every iteration. the function has to work inplace of `Y`, that is `(M, Y, p, X) -> Y`, where `X` and `Y` can be the same memory. -* `randomize=false`: indicate whether `X` is initialised to a random vector or not. - This disables preconditioning. -* $(_kw_retraction_method_default): $(_kw_retraction_method) -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(`$(_link_manifold_dimension("base_manifold(Tpm)"))`)` - $(_sc_any)[`StopWhenResidualIsReducedByFactorOrPower`](@ref)`(; κ=κ, θ=θ)`$(_sc_any)[`StopWhenTrustRegionIsExceeded`](@ref)`()` - $(_sc_any)[`StopWhenCurvatureIsNegative`](@ref)`()`$(_sc_any)[`StopWhenModelIncreased`](@ref)`()`: - $(_kw_stopping_criterion) +* `randomize=false`: indicate whether `X` is initialised to a random vector or not. This disables preconditioning. +$(_var(:Keyword, :retraction_method)) +$(_var( + :Keyword, + :stopping_criterion; + default="[`StopAfterIteration`](@ref)`(`$(_link(:manifold_dimension; M="base_manifold(Tpm)"))`)`$(_sc(:Any))[`StopWhenResidualIsReducedByFactorOrPower`](@ref)`(; κ=κ, θ=θ)`$(_sc(:Any))[`StopWhenTrustRegionIsExceeded`](@ref)`()`$(_sc(:Any))[`StopWhenCurvatureIsNegative`](@ref)`()`$(_sc(:Any))[`StopWhenModelIncreased`](@ref)`()`")) * `trust_region_radius=`[`injectivity_radius`](@extref `ManifoldsBase.injectivity_radius-Tuple{AbstractManifold}`)`(M) / 4`: the initial trust-region radius -$(_kw_others) +$(_note(:OtherKeywords)) -$(_doc_sec_output) +$(_note(:OutputSection)) # See also @@ -477,65 +470,13 @@ $(_doc_sec_output) @doc "$(_doc_TCGD)" truncated_conjugate_gradient_descent(M::AbstractManifold, args; kwargs...) # No Hessian, no point/vector -function truncated_conjugate_gradient_descent(M::AbstractManifold, f, grad_f; kwargs...) - return truncated_conjugate_gradient_descent(M, f, grad_f, rand(M); kwargs...) -end -# No Hessian, no vector -function truncated_conjugate_gradient_descent(M::AbstractManifold, f, grad_f, p; kwargs...) - return truncated_conjugate_gradient_descent( - M, f, grad_f, p, rand(M; vector_at=p); kwargs... - ) -end -# no Hessian -function truncated_conjugate_gradient_descent( - M::AbstractManifold, - f, - grad_f, - p, - X; - evaluation=AllocatingEvaluation(), - retraction_method::AbstractRetractionMethod=default_retraction_method(M, typeof(p)), - kwargs..., -) - Hess_f = ApproxHessianFiniteDifference( - M, copy(M, p), grad_f; evaluation=evaluation, retraction_method=retraction_method - ) - return truncated_conjugate_gradient_descent( - M, - f, - grad_f, - Hess_f, - p, - X; - evaluation=evaluation, - retraction_method=retraction_method, - kwargs..., - ) -end -# no point/vector -function truncated_conjugate_gradient_descent( - M::AbstractManifold, f, grad_f, Hess_f::TH; kwargs... -) where {TH<:Function} - return truncated_conjugate_gradient_descent(M, f, grad_f, Hess_f, rand(M); kwargs...) -end -# no vector -function truncated_conjugate_gradient_descent( - M::AbstractManifold, f, grad_f, Hess_f::TH, p; kwargs... -) where {TH<:Function} - return truncated_conjugate_gradient_descent( - M, f, grad_f, Hess_f, p, rand(M; vector_at=p); kwargs... - ) -end -# -# All defaults filled, generate mho -# function truncated_conjugate_gradient_descent( M::AbstractManifold, f, grad_f, - Hess_f::TH, - p, - X; + Hess_f, + p=rand(M), + X=rand(M; vector_at=p); evaluation=AllocatingEvaluation(), preconditioner=if evaluation isa InplaceEvaluation (M, Y, p, X) -> (Y .= X) @@ -543,52 +484,26 @@ function truncated_conjugate_gradient_descent( (M, p, X) -> X end, kwargs..., -) where {TH<:Function} - mho = ManifoldHessianObjective(f, grad_f, Hess_f, preconditioner; evaluation=evaluation) - return truncated_conjugate_gradient_descent( - M, mho, p, X; evaluation=evaluation, kwargs... +) + p_ = _ensure_mutating_variable(p) + X_ = _ensure_mutating_variable(X) + f_ = _ensure_mutating_cost(f, p) + grad_f_ = _ensure_mutating_gradient(grad_f, p, evaluation) + preconditioner_ = _ensure_mutating_hessian(preconditioner, p, evaluation) + Hess_f_ = _ensure_mutating_hessian(Hess_f, p, evaluation) + + mho = ManifoldHessianObjective( + f_, grad_f_, Hess_f_, preconditioner_; evaluation=evaluation ) -end -function truncated_conjugate_gradient_descent( - M::AbstractManifold, - f, - grad_f, - Hess_f::TH, #fill a default below before dispatching on p::Number - p::Number, - X::Number; - evaluation::AbstractEvaluationType=AllocatingEvaluation(), - preconditioner=(M, p, X) -> X, - kwargs..., -) where {TH<:Function} - q = [p] - Y = [X] - f_(M, p) = f(M, p[]) - if evaluation isa AllocatingEvaluation - grad_f_ = (M, p) -> [grad_f(M, p[])] - Hess_f_ = (M, p, X) -> [Hess_f(M, p[], X[])] - precon_ = (M, p, X) -> [preconditioner(M, p[], X[])] - else - grad_f_ = (M, X, p) -> (X .= [grad_f(M, p[])]) - Hess_f_ = (M, Y, p, X) -> (Y .= [Hess_f(M, p[], X[])]) - precon_ = (M, Y, p, X) -> (Y .= [preconditioner(M, p[], X[])]) - end rs = truncated_conjugate_gradient_descent( - M, - f_, - grad_f_, - Hess_f_, - q, - Y; - preconditioner=precon_, - evaluation=evaluation, - kwargs..., + M, mho, p_, X_; evaluation=evaluation, kwargs... ) - return (typeof(q) == typeof(rs)) ? rs[] : rs + return _ensure_matching_output(p, rs) end # # Objective 1 -> generate model function truncated_conjugate_gradient_descent( - M::AbstractManifold, mho::O, p, X; kwargs... + M::AbstractManifold, mho::O, p=rand(M), X=rand(M; vector_at=p); kwargs... ) where {O<:Union{ManifoldHessianObjective,AbstractDecoratedManifoldObjective}} trmo = TrustRegionModelObjective(mho) TpM = TangentSpace(M, copy(M, p)) @@ -611,38 +526,11 @@ end @doc "$(_doc_TCGD)" truncated_conjugate_gradient_descent!(M::AbstractManifold, args...; kwargs...) -# no Hessian function truncated_conjugate_gradient_descent!( M::AbstractManifold, f, grad_f, - p, - X; - evaluation::AbstractEvaluationType=AllocatingEvaluation(), - retraction_method::AbstractRetractionMethod=default_retraction_method(M, typeof(p)), - kwargs..., -) - hess_f = ApproxHessianFiniteDifference( - M, copy(M, p), grad_f; evaluation=evaluation, retraction_method=retraction_method - ) - return truncated_conjugate_gradient_descent!( - M, - f, - grad_f, - hess_f, - p, - X; - evaluation=evaluation, - retraction_method=retraction_method, - kwargs..., - ) -end -# From functions to objective -function truncated_conjugate_gradient_descent!( - M::AbstractManifold, - f, - grad_f, - Hess_f::TH, + Hess_f, p, X; evaluation::AbstractEvaluationType=AllocatingEvaluation(), @@ -652,7 +540,7 @@ function truncated_conjugate_gradient_descent!( (M, p, X) -> X end, kwargs..., -) where {TH<:Function} +) mho = ManifoldHessianObjective(f, grad_f, Hess_f, preconditioner; evaluation=evaluation) return truncated_conjugate_gradient_descent!( M, mho, p, X; evaluation=evaluation, kwargs... @@ -687,8 +575,8 @@ function truncated_conjugate_gradient_descent!( dtrm = decorate_objective!(TpM, trm; kwargs...) mp = DefaultManoptProblem(TpM, dtrm) tcgs = TruncatedConjugateGradientState( - TpM, - X; + TpM; + X=X, trust_region_radius=trust_region_radius, randomize=randomize, θ=θ, diff --git a/src/solvers/trust_regions.jl b/src/solvers/trust_regions.jl index 3fb55db809..832ee55faa 100644 --- a/src/solvers/trust_regions.jl +++ b/src/solvers/trust_regions.jl @@ -8,56 +8,54 @@ Store the state of the trust-regions solver. * `acceptance_rate`: a lower bound of the performance ratio for the iterate that decides if the iteration is accepted or not. -* `HX`, `HY`, `HZ`: interim storage (to avoid allocation) of ``$(_l_Hess) f(p)[⋅]` of `X`, `Y`, `Z` +* `HX`, `HY`, `HZ`: interim storage (to avoid allocation) of ``$(_tex(:Hess)) f(p)[⋅]` of `X`, `Y`, `Z` * `max_trust_region_radius`: the maximum trust-region radius -* $(_field_p) +$(_var(:Field, :p; add=[:as_Iterate])) * `project!`: for numerical stability it is possible to project onto the tangent space after every iteration. the function has to work inplace of `Y`, that is `(M, Y, p, X) -> Y`, where `X` and `Y` can be the same memory. -* $(_field_stop) +$(_var(:Field, :stopping_criterion, "stop")) * `randomize`: indicate whether `X` is initialised to a random vector or not * `ρ_regularization`: regularize the model fitness ``ρ`` to avoid division by zero -* $_field_sub_problem -* $_field_sub_state +$(_var(:Field, :sub_problem)) +$(_var(:Field, :sub_state)) * `σ`: Gaussian standard deviation when creating the random initial tangent vector This field has no effect, when `randomize` is false. * `trust_region_radius`: the trust-region radius -* $(_field_X) +$(_var(:Field, :X)) * `Y`: the solution (tangent vector) of the subsolver * `Z`: the Cauchy point (only used if random is activated) # Constructors - TrustRegionsState(M, mho; kwargs...) - TrustRegionsState(M, p, mho; kwargs...) + TrustRegionsState(M, mho::AbstractManifoldHessianObjective; kwargs...) TrustRegionsState(M, sub_problem, sub_state; kwargs...) - TrustRegionsState(M, p, sub_problem, sub_state; kwargs...) - TrustRegionsState(M, f::Function; evaluation=AllocatingEvaluation, kwargs...) - TrustRegionsState(M, p, f; evaluation=AllocatingEvaluation, kwargs...) + TrustRegionsState(M, sub_problem; evaluation=AllocatingEvaluation(), kwargs...) -# Input - -$(_arg_M) -$(_arg_p) +create a trust region state. +* given a [`AbstractManifoldHessianObjective`](@ref) `mho`, the default sub solver, + a [`TruncatedConjugateGradientState`](@ref) with `mho` used to define the problem on a tangent space is created +* given a `sub_problem` and an `evaluation=` keyword, the sub problem solver is assumed to be the closed form solution, + where `evaluation` determines how to call the sub function. -as well as either +# Input -* an [`ManifoldHessianObjective`](@ref) `mho`, then `sub_state` and `sub_problem` are filled with a default (deprecated). -* a `sub_problem` and a `sub_state` -* a function `f` and its `evaluation` as a closed form solution for the sub solver. +$(_var(:Argument, :M; type=true)) +$(_var(:Argument, :sub_problem)) +$(_var(:Argument, :sub_state)) ## Keyword arguments * `acceptance_rate=0.1` * `max_trust_region_radius=sqrt(manifold_dimension(M))` +$(_var(:Keyword, :p; add=:as_Initial)) * `project!=copyto!` -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(1000)`$(_sc_any)[`StopWhenGradientNormLess`](@ref)`(1e-6)`: - $(_kw_stopping_criterion) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(1000)`$(_sc(:Any))[`StopWhenGradientNormLess`](@ref)`(1e-6)`")) * `randomize=false` * `ρ_regularization=10000.0` * `θ=1.0` * `trust_region_radius=max_trust_region_radius / 8` -* $(_kw_X_default): $(_kw_X) +$(_var(:Keyword, :X; add=:as_Memory)) # See also @@ -148,46 +146,14 @@ mutable struct TrustRegionsState{ return trs end end -# No point, no state -> add point -function TrustRegionsState( - M, sub_problem::Pr; kwargs... -) where {Pr<:Union{AbstractManoptProblem,<:Function,AbstractManifoldHessianObjective}} - return TrustRegionsState(M, rand(M), sub_problem; kwargs...) -end -# No point but state -> add point -function TrustRegionsState(M, sub_problem, sub_state::AbstractManoptSolverState; kwargs...) - return TrustRegionsState(M, rand(M), sub_problem, sub_state; kwargs...) -end -# point, but no state for a function -> add evaluation as state -function TrustRegionsState( - M, - p, - sub_problem::Function; - evaluation::AbstractEvaluationType=AllocatingEvaluation(), - kwargs..., -) - return TrustRegionsState( - M, p, sub_problem, ClosedFormSubSolverState(evaluation); kwargs... - ) -end -function TrustRegionsState( - M, p, mho::H; kwargs... -) where {H<:AbstractManifoldHessianObjective} - TpM = TangentSpace(M, copy(M, p)) - problem = DefaultManoptProblem(TpM, TrustRegionModelObjective(mho)) - state = TruncatedConjugateGradientState(TpM, get_gradient(M, mho, p)) - return TrustRegionsState(M, p, problem, state; kwargs...) -end + function TrustRegionsState( - M::TM, - p::P, + M::AbstractManifold, sub_problem::Pr, - sub_state::Union{AbstractEvaluationType,AbstractManoptSolverState}=TruncatedConjugateGradientState( - TangentSpace(M, copy(M, p)), zero_vector(M, p) - ); + sub_state::St; + p::P=rand(M), X::T=zero_vector(M, p), - ρ_prime::R=0.1, #deprecated, remove on next breaking change - acceptance_rate=ρ_prime, + acceptance_rate=0.1, ρ_regularization::R=1000.0, randomize::Bool=false, stopping_criterion::SC=StopAfterIteration(1000) | StopWhenGradientNormLess(1e-6), @@ -201,17 +167,16 @@ function TrustRegionsState( project!::Proj=copyto!, σ=randomize ? 1e-4 : 0.0, ) where { - TM<:AbstractManifold, - Pr<:AbstractManoptProblem, P, T, + Pr<:Union{AbstractManoptProblem,F} where {F}, + St<:AbstractManoptSolverState, R<:Real, SC<:StoppingCriterion, RTR<:AbstractRetractionMethod, Proj, } - sub_state_storage = maybe_wrap_evaluation_type(sub_state) - return TrustRegionsState{P,T,Pr,typeof(sub_state_storage),SC,RTR,R,Proj}( + return TrustRegionsState{P,T,Pr,St,SC,RTR,R,Proj}( p, X, trust_region_radius, @@ -224,13 +189,27 @@ function TrustRegionsState( reduction_threshold, augmentation_threshold, sub_problem, - sub_state_storage, + sub_state, project!, reduction_factor, augmentation_factor, σ, ) end +function TrustRegionsState( + M::AbstractManifold, sub_problem; evaluation::E=AllocatingEvaluation(), kwargs... +) where {E<:AbstractEvaluationType} + cfs = ClosedFormSubSolverState(; evaluation=evaluation) + return TrustRegionsState(M, sub_problem, cfs; kwargs...) +end +function TrustRegionsState( + M::AbstractManifold, mho::AbstractManifoldHessianObjective; p=rand(M), kwargs... +) + TpM = TangentSpace(M, copy(M, p)) + problem = DefaultManoptProblem(TpM, TrustRegionModelObjective(mho)) + state = TruncatedConjugateGradientState(TpM; X=get_gradient(M, mho, p)) + return TrustRegionsState(M, problem, state; p=p, kwargs...) +end get_iterate(trs::TrustRegionsState) = trs.p function set_iterate!(trs::TrustRegionsState, M, p) copyto!(M, trs.p, p) @@ -289,11 +268,11 @@ by default the [`truncated_conjugate_gradient_descent`](@ref) is used. # Input -$(_arg_M) -$(_arg_f) -$(_arg_grad_f) -$(_arg_Hess_f) -$(_arg_p) +$(_var(:Argument, :M; type=true)) +$(_var(:Argument, :f)) +$(_var(:Argument, :grad_f)) +$(_var(:Argument, :Hess_f)) +$(_var(:Argument, :p)) # Keyword arguments @@ -303,7 +282,7 @@ $(_arg_p) * `augmentation_threshold=0.75`: trust-region augmentation threshold: if ρ is larger than this threshold, a solution is on the trust region boundary and negative curvature, and the radius is extended (augmented) * `augmentation_factor=2.0`: trust-region augmentation factor -* $(_kw_evaluation_default): $(_kw_evaluation) +$(_var(:Keyword, :evaluation)) * `κ=0.1`: the linear convergence target rate of the tCG method [`truncated_conjugate_gradient_descent`](@ref), and is used in a stopping criterion therein * `max_trust_region_radius`: the maximum trust-region radius @@ -318,16 +297,12 @@ $(_arg_p) * `reduction_factor=0.25`: trust-region reduction factor * `reduction_threshold=0.1`: trust-region reduction threshold: if ρ is below this threshold, the trust region radius is reduced by `reduction_factor`. -* $(_kw_retraction_method_default): $(_kw_retraction_method) -* `stopping_criterion=`[`StopAfterIteration`](@ref)`(1000)`$(_sc_any)[`StopWhenGradientNormLess`](@ref)`(1e-6)`: - $(_kw_stopping_criterion) -* $(_kw_sub_kwargs_default): $(_kw_sub_kwargs) -* `sub_stopping_criterion` – the default from [`truncated_conjugate_gradient_descent`](@ref): - $(_kw_stopping_criterion) -* `sub_problem=`[`DefaultManoptProblem`](@ref)`(M, `[`ConstrainedManifoldObjective`](@ref)`(subcost, subgrad; evaluation=evaluation))`: - problem for the subsolver -* `sub_state=`[`QuasiNewtonState`](@ref)) using [`QuasiNewtonLimitedMemoryDirectionUpdate`](@ref) with [`InverseBFGS`](@ref) and `sub_stopping_criterion` as a stopping criterion. - See also `sub_kwargs=`. +$(_var(:Keyword, :retraction_method)) +$(_var(:Keyword, :stopping_criterion; default="[`StopAfterIteration`](@ref)`(1000)`$(_sc(:Any))[`StopWhenGradientNormLess`](@ref)`(1e-6)`")) +$(_var(:Keyword, :sub_kwargs)) +$(_var(:Keyword, :stopping_criterion, "sub_stopping_criterion"; default="( see [`truncated_conjugate_gradient_descent`](@ref))")) +$(_var(:Keyword, :sub_problem; default="[`DefaultManoptProblem`](@ref)`(M, `[`ConstrainedManifoldObjective`](@ref)`(subcost, subgrad; evaluation=evaluation))`")) +$(_var(:Keyword, :sub_state; default="[`QuasiNewtonState`](@ref)", add=" where [`QuasiNewtonLimitedMemoryDirectionUpdate`](@ref) with [`InverseBFGS`](@ref) is used")) * `θ=1.0`: the superlinear convergence target rate of ``1+θ`` of the tCG-method [`truncated_conjugate_gradient_descent`](@ref), and is used in a stopping criterion therein * `trust_region_radius=`[`injectivity_radius`](@extref `ManifoldsBase.injectivity_radius-Tuple{AbstractManifold}`)`(M) / 4`: the initial trust-region radius @@ -335,9 +310,9 @@ $(_arg_p) For the case that no Hessian is provided, the Hessian is computed using finite difference, see [`ApproxHessianFiniteDifference`](@ref). -$(_kw_others) +$(_note(:OtherKeywords)) -$(_doc_sec_output) +$(_note(:OutputSection)) # See also [`truncated_conjugate_gradient_descent`](@ref) @@ -366,42 +341,22 @@ function trust_regions( end, kwargs..., ) where {TH<:Function} - mho = ManifoldHessianObjective(f, grad_f, Hess_f, preconditioner; evaluation=evaluation) - return trust_regions(M, mho, p; evaluation=evaluation, kwargs...) -end -# Hessian (Function) and point (but a number) -function trust_regions( - M::AbstractManifold, - f, - grad_f, - Hess_f::TH, #fill a default below before dispatching on p::Number - p::Number; - evaluation::AbstractEvaluationType=AllocatingEvaluation(), - preconditioner=(M, p, X) -> X, - kwargs..., -) where {TH<:Function} - q = [p] - f_(M, p) = f(M, p[]) - Hess_f_ = Hess_f - if evaluation isa AllocatingEvaluation - grad_f_ = (M, p) -> [grad_f(M, p[])] - Hess_f_ = (M, p, X) -> [Hess_f(M, p[], X[])] - precon_ = (M, p, X) -> [preconditioner(M, p[], X[])] - else - grad_f_ = (M, X, p) -> (X .= [grad_f(M, p[])]) - Hess_f_ = (M, Y, p, X) -> (Y .= [Hess_f(M, p[], X[])]) - precon_ = (M, Y, p, X) -> (Y .= [preconditioner(M, p[], X[])]) - end - rs = trust_regions( - M, f_, grad_f_, Hess_f_, q; preconditioner=precon_, evaluation=evaluation, kwargs... + p_ = _ensure_mutating_variable(p) + f_ = _ensure_mutating_cost(f, p) + grad_f_ = _ensure_mutating_gradient(grad_f, p, evaluation) + Hess_f_ = _ensure_mutating_hessian(Hess_f, p, evaluation) + preconditioner_ = _ensure_mutating_hessian(preconditioner, p, evaluation) + mho = ManifoldHessianObjective( + f_, grad_f_, Hess_f_, preconditioner_; evaluation=evaluation ) - return (typeof(q) == typeof(rs)) ? rs[] : rs + rs = trust_regions(M, mho, p_; evaluation=evaluation, kwargs...) + return _ensure_matching_output(p, rs) end # neither Hessian (Function) nor point function trust_regions(M::AbstractManifold, f, grad_f; kwargs...) return trust_regions(M, f, grad_f, rand(M); kwargs...) end -# no Hessian (Function), but point (any) +# no Hessian (Function), point (any) function trust_regions( M::AbstractManifold, f::TF, @@ -512,8 +467,8 @@ function trust_regions!( StopWhenModelIncreased(), sub_state::AbstractManoptSolverState=decorate_state!( TruncatedConjugateGradientState( - TangentSpace(M, copy(M, p)), - zero_vector(M, p); + TangentSpace(M, copy(M, p)); + X=zero_vector(M, p), θ=θ, κ=κ, trust_region_radius, @@ -540,9 +495,9 @@ function trust_regions!( dmp = DefaultManoptProblem(M, dmho) trs = TrustRegionsState( M, - p, sub_problem, - sub_state; + maybe_wrap_evaluation_type(sub_state); + p=p, X=get_gradient(dmp, p), trust_region_radius=trust_region_radius, max_trust_region_radius=max_trust_region_radius, @@ -594,9 +549,9 @@ function step_solver!(mp::AbstractManoptProblem, trs::TrustRegionsState, k) end # Update the current gradient get_gradient!(M, trs.X, mho, trs.p) - set_manopt_parameter!(trs.sub_problem, :Manifold, :Basepoint, copy(M, trs.p)) - set_manopt_parameter!(trs.sub_state, :Iterate, copy(M, trs.p, trs.Y)) - set_manopt_parameter!(trs.sub_state, :TrustRegionRadius, trs.trust_region_radius) + set_parameter!(trs.sub_problem, :Manifold, :Basepoint, copy(M, trs.p)) + set_parameter!(trs.sub_state, :Iterate, copy(M, trs.p, trs.Y)) + set_parameter!(trs.sub_state, :TrustRegionRadius, trs.trust_region_radius) solve!(trs.sub_problem, trs.sub_state) # copyto!(M, trs.Y, trs.p, get_solver_result(trs.sub_state)) @@ -644,7 +599,7 @@ function step_solver!(mp::AbstractManoptProblem, trs::TrustRegionsState, k) if ρ < trs.reduction_threshold || !model_decreased || isnan(ρ) trs.trust_region_radius *= trs.reduction_factor elseif ρ > trs.augmentation_threshold && - get_manopt_parameter(trs.sub_state, :TrustRegionExceeded) + get_parameter(trs.sub_state, :TrustRegionExceeded) # (b) performed great and exceed/reach the trust region boundary -> increase radius trs.trust_region_radius = min( trs.augmentation_factor * trs.trust_region_radius, trs.max_trust_region_radius diff --git a/test/helpers/test_linesearches.jl b/test/helpers/test_linesearches.jl index 290f8fd64f..1f158c3ed3 100644 --- a/test/helpers/test_linesearches.jl +++ b/test/helpers/test_linesearches.jl @@ -59,7 +59,7 @@ using Test rosenbrock_throw, rosenbrock_grad!; evaluation=InplaceEvaluation() ) mp_throw = DefaultManoptProblem(M, mgo_throw) - st_qn = QuasiNewtonState(M, x0) + st_qn = QuasiNewtonState(M; p=x0) initialize_solver!(mp, st_qn) ls_mt = Manopt.LineSearchesStepsize(M, LineSearches.MoreThuente()) @test_throws ErrorException ls_mt(mp_throw, st_qn, 1; fp=rosenbrock(M, x0)) diff --git a/test/helpers/test_manifold_extra_functions.jl b/test/helpers/test_manifold_extra_functions.jl index d924887b5c..e6bb8e1888 100644 --- a/test/helpers/test_manifold_extra_functions.jl +++ b/test/helpers/test_manifold_extra_functions.jl @@ -1,4 +1,4 @@ -using Manifolds, Manopt, Test, ManifoldsBase +using Manifolds, Manopt, Test, ManifoldsBase, RecursiveArrayTools using LinearAlgebra: I using Random diff --git a/test/plans/test_conjugate_gradient_plan.jl b/test/plans/test_conjugate_gradient_plan.jl index 14d658fd07..e41a29e597 100644 --- a/test/plans/test_conjugate_gradient_plan.jl +++ b/test/plans/test_conjugate_gradient_plan.jl @@ -9,41 +9,59 @@ Manopt.update_rule_storage_vectors(::DummyCGCoeff) = Tuple{} @testset "Test Restart CG" begin M = Euclidean(2) du = DummyCGCoeff() - dur2 = ConjugateGradientBealeRestart(du, 0.3) - dur3 = ConjugateGradientBealeRestart(du, 0.1) + dur2 = ConjugateGradientBealeRestart(du; threshold=0.3) + dur3 = ConjugateGradientBealeRestart(du; threshold=0.1) f(M, p) = norm(M, p)^2 grad_f(M, p) = p p0 = [1.0, 0.0] pr = DefaultManoptProblem(M, ManifoldGradientObjective(f, grad_f)) cgs2 = ConjugateGradientDescentState( - M, - p0; + M; + p=p0, stopping_criterion=StopAfterIteration(2), - stepsize=ConstantStepsize(1.0), + stepsize=Manopt.ConstantStepsize(M, 1.0), coefficient=dur2, ) cgs2.X = [0.0, 0.2] @test cgs2.coefficient(pr, cgs2, 1) != 0 cgs3 = ConjugateGradientDescentState( - M, - p0; + M; + p=p0, stopping_criterion=StopAfterIteration(2), - stepsize=ConstantStepsize(1.0), + stepsize=Manopt.ConstantStepsize(M, 1.0), coefficient=dur3, ) cgs3.X = [0.0, 0.2] @test cgs3.coefficient(pr, cgs3, 1) == 0 end @testset "representation and summary of Coefficients" begin - pt = repr(ParallelTransport()) - @test repr(ConjugateDescentCoefficient()) == "ConjugateDescentCoefficient()" - @test repr(DaiYuanCoefficient()) == "DaiYuanCoefficient($pt)" - @test repr(HagerZhangCoefficient()) == "HagerZhangCoefficient($pt)" - @test repr(HestenesStiefelCoefficient()) == "HestenesStiefelCoefficient($pt)" - @test repr(PolakRibiereCoefficient()) == "PolakRibiereCoefficient($pt)" - cgbr = ConjugateGradientBealeRestart(ConjugateDescentCoefficient()) - s1 = "ConjugateGradientBealeRestart(ConjugateDescentCoefficient(), 0.2; vector_transport_method=$pt)" + p = ParallelTransport() + pt = repr(p) + M = Euclidean(2) + @test repr(Manopt.ConjugateDescentCoefficient()(M)) == + "Manopt.ConjugateDescentCoefficientRule()" + @test repr(FletcherReevesCoefficient()()) == + "Manopt.FletcherReevesCoefficientRule()" + # either in the factory constructor or in the factory call we need M + # so lets alternate + @test repr(Manopt.DaiYuanCoefficient(M; vector_transport_method=p)()) == + "Manopt.DaiYuanCoefficientRule(; vector_transport_method=$pt)" + @test repr(HagerZhangCoefficient(; vector_transport_method=p)(M)) == + "Manopt.HagerZhangCoefficientRule(; vector_transport_method=$pt)" + @test repr(HestenesStiefelCoefficient()(M)) == + "Manopt.HestenesStiefelCoefficientRule(; vector_transport_method=$pt)" + # Requires a manifold + @test_throws MethodError HestenesStiefelCoefficient()() + @test repr(PolakRibiereCoefficient()(M)) == + "Manopt.PolakRibiereCoefficientRule(; vector_transport_method=$(pt))" + cgbr = Manopt.ConjugateGradientBealeRestartRule( + Euclidean(), ConjugateDescentCoefficient() + ) + s1 = "Manopt.ConjugateGradientBealeRestartRule(Manopt.ConjugateDescentCoefficientRule(); threshold=0.2, vector_transport_method=$(pt))" @test repr(cgbr) == s1 - @test repr(LiuStoreyCoefficient()) == "LiuStoreyCoefficient($pt)" + cgbr2 = Manopt.ConjugateGradientBealeRestartRule(ConjugateDescentCoefficient()) + @test cgbr.threshold == cgbr.threshold + @test repr(LiuStoreyCoefficient(M)()) == + "Manopt.LiuStoreyCoefficientRule(; vector_transport_method=$pt)" end end diff --git a/test/plans/test_constrained_plan.jl b/test/plans/test_constrained_plan.jl index 0a3deb84c6..a0fd3842b9 100644 --- a/test/plans/test_constrained_plan.jl +++ b/test/plans/test_constrained_plan.jl @@ -1,4 +1,4 @@ -using LRUCache, Manopt, Manifolds, ManifoldsBase, Test +using LRUCache, Manopt, Manifolds, ManifoldsBase, Test, RecursiveArrayTools include("../utils/dummy_types.jl") @@ -421,20 +421,20 @@ include("../utils/dummy_types.jl") Lh(M, LX, p, X) @test LX == Lh(M, p, X) # Get & Set - @test Manopt.set_manopt_parameter!(Lc, :μ, [2.0, 2.0]) == Lc - @test Manopt.get_manopt_parameter(Lc, :μ) == [2.0, 2.0] - @test Manopt.set_manopt_parameter!(Lc, :λ, [2.0]) == Lc - @test Manopt.get_manopt_parameter(Lc, :λ) == [2.0] - - @test Manopt.set_manopt_parameter!(Lg, :μ, [2.0, 2.0]) == Lg - @test Manopt.get_manopt_parameter(Lg, :μ) == [2.0, 2.0] - @test Manopt.set_manopt_parameter!(Lg, :λ, [2.0]) == Lg - @test Manopt.get_manopt_parameter(Lg, :λ) == [2.0] - - @test Manopt.set_manopt_parameter!(Lh, :μ, [2.0, 2.0]) == Lh - @test Manopt.get_manopt_parameter(Lh, :μ) == [2.0, 2.0] - @test Manopt.set_manopt_parameter!(Lh, :λ, [2.0]) == Lh - @test Manopt.get_manopt_parameter(Lh, :λ) == [2.0] + @test Manopt.set_parameter!(Lc, :μ, [2.0, 2.0]) == Lc + @test Manopt.get_parameter(Lc, :μ) == [2.0, 2.0] + @test Manopt.set_parameter!(Lc, :λ, [2.0]) == Lc + @test Manopt.get_parameter(Lc, :λ) == [2.0] + + @test Manopt.set_parameter!(Lg, :μ, [2.0, 2.0]) == Lg + @test Manopt.get_parameter(Lg, :μ) == [2.0, 2.0] + @test Manopt.set_parameter!(Lg, :λ, [2.0]) == Lg + @test Manopt.get_parameter(Lg, :λ) == [2.0] + + @test Manopt.set_parameter!(Lh, :μ, [2.0, 2.0]) == Lh + @test Manopt.get_parameter(Lh, :μ) == [2.0, 2.0] + @test Manopt.set_parameter!(Lh, :λ, [2.0]) == Lh + @test Manopt.get_parameter(Lh, :λ) == [2.0] end @testset "Full KKT and its norm" begin # Full KKT Vector field @@ -531,12 +531,12 @@ include("../utils/dummy_types.jl") @test Wc == W # get & set for ck in [CKKTvf, CKKTVfJ] - @test Manopt.set_manopt_parameter!(ck, :μ, [2.0, 2.0]) == ck - @test Manopt.get_manopt_parameter(ck, :μ) == [2.0, 2.0] - @test Manopt.set_manopt_parameter!(ck, :s, [2.0, 2.0]) == ck - @test Manopt.get_manopt_parameter(ck, :s) == [2.0, 2.0] - @test Manopt.set_manopt_parameter!(ck, :β, 2.0) == ck - @test Manopt.get_manopt_parameter(ck, :β) == 2.0 + @test Manopt.set_parameter!(ck, :μ, [2.0, 2.0]) == ck + @test Manopt.get_parameter(ck, :μ) == [2.0, 2.0] + @test Manopt.set_parameter!(ck, :s, [2.0, 2.0]) == ck + @test Manopt.get_parameter(ck, :s) == [2.0, 2.0] + @test Manopt.set_parameter!(ck, :β, 2.0) == ck + @test Manopt.get_parameter(ck, :β) == 2.0 end end end @@ -559,10 +559,10 @@ include("../utils/dummy_types.jl") @test gALC(M, p) ≈ ag gALC(M, X, p) @test gALC(M, X, p) ≈ ag - @test Manopt.set_manopt_parameter!(ALC, :ρ, 2 * ρ) == ALC - @test Manopt.get_manopt_parameter(ALC, :ρ) == 2 * ρ - @test Manopt.set_manopt_parameter!(gALC, :ρ, 2 * ρ) == gALC - @test Manopt.get_manopt_parameter(gALC, :ρ) == 2 * ρ + @test Manopt.set_parameter!(ALC, :ρ, 2 * ρ) == ALC + @test Manopt.get_parameter(ALC, :ρ) == 2 * ρ + @test Manopt.set_parameter!(gALC, :ρ, 2 * ρ) == gALC + @test Manopt.get_parameter(gALC, :ρ) == 2 * ρ end end end diff --git a/test/plans/test_debug.jl b/test/plans/test_debug.jl index dde81d62df..fc74e0ee9a 100644 --- a/test/plans/test_debug.jl +++ b/test/plans/test_debug.jl @@ -15,10 +15,10 @@ Manopt.get_message(::TestMessageState) = "DebugTest" mutable struct TestDebugParameterState <: AbstractManoptSolverState value::Int end -function Manopt.set_manopt_parameter!(d::TestDebugParameterState, ::Val{:value}, v) +function Manopt.set_parameter!(d::TestDebugParameterState, ::Val{:value}, v) (d.value = v; return d) end -Manopt.get_manopt_parameter(d::TestDebugParameterState, ::Val{:value}) = d.value +Manopt.get_parameter(d::TestDebugParameterState, ::Val{:value}) = d.value @testset "Debug State" begin # helper to get debug as string @@ -27,7 +27,10 @@ Manopt.get_manopt_parameter(d::TestDebugParameterState, ::Val{:value}) = d.value M = ManifoldsBase.DefaultManifold(2) p = [4.0, 2.0] st = GradientDescentState( - M, p; stopping_criterion=StopAfterIteration(20), stepsize=ConstantStepsize(M) + M; + p=p, + stopping_criterion=StopAfterIteration(10), + stepsize=Manopt.ConstantStepsize(M), ) f(M, q) = distance(M, q, p) .^ 2 grad_f(M, q) = -2 * log(M, q, p) @@ -43,6 +46,10 @@ Manopt.get_manopt_parameter(d::TestDebugParameterState, ::Val{:value}) = d.value @test DebugSolverState(st, ["|"]).debugDictionary[:Iteration].divider == a1.divider @test endswith(repr(DebugSolverState(st, a1)), "\"|\"") @test repr(DebugSolverState(st, Dict{Symbol,DebugAction}())) == repr(st) + # Passthrough + dss = DebugSolverState(st, a1) + Manopt.set_parameter!(dss, :StoppingCriterion, :MaxIteration, 20) + @test dss.state.stop.max_iterations == 20 #Maybe turn into a getter? # single AbstractStateActions # DebugDivider a1(mp, st, 0) @@ -240,14 +247,17 @@ Manopt.get_manopt_parameter(d::TestDebugParameterState, ::Val{:value}) = d.value @testset "Debug and parameter passthrough" begin s = TestDebugParameterState(0) d = DebugSolverState(s, DebugDivider(" | ")) - Manopt.set_manopt_parameter!(d, :value, 1) - @test Manopt.get_manopt_parameter(d, :value) == 1 + Manopt.set_parameter!(d, :value, 1) + @test Manopt.get_parameter(d, :value) == 1 end @testset "Debug Warnings" begin M = ManifoldsBase.DefaultManifold(2) p = [4.0, 2.0] st = GradientDescentState( - M, p; stopping_criterion=StopAfterIteration(20), stepsize=ConstantStepsize(M) + M; + p=p, + stopping_criterion=StopAfterIteration(20), + stepsize=Manopt.ConstantStepsize(M), ) f(M, y) = Inf grad_f(M, y) = Inf .* ones(2) @@ -292,7 +302,10 @@ Manopt.get_manopt_parameter(d::TestDebugParameterState, ::Val{:value}) = d.value M = ManifoldsBase.DefaultManifold(2) p = [4.0, 2.0] st = GradientDescentState( - M, p; stopping_criterion=StopAfterIteration(20), stepsize=ConstantStepsize(M) + M; + p=p, + stopping_criterion=StopAfterIteration(20), + stepsize=Manopt.ConstantStepsize(M), ) f(M, q) = distance(M, q, p) .^ 2 grad_f(M, q) = -2 * log(M, q, p) @@ -376,7 +389,10 @@ Manopt.get_manopt_parameter(d::TestDebugParameterState, ::Val{:value}) = d.value M = ManifoldsBase.DefaultManifold(2) p = [-4.0, 2.0] st = GradientDescentState( - M, p; stopping_criterion=StopAfterIteration(20), stepsize=ConstantStepsize(M) + M; + p=p, + stopping_criterion=StopAfterIteration(20), + stepsize=Manopt.ConstantStepsize(M), ) f(M, y) = Inf grad_f(M, y) = Inf .* ones(2) @@ -398,7 +414,10 @@ Manopt.get_manopt_parameter(d::TestDebugParameterState, ::Val{:value}) = d.value M = ManifoldsBase.DefaultManifold(2) p = [4.0, 2.0] st = GradientDescentState( - M, p; stopping_criterion=StopAfterIteration(20), stepsize=ConstantStepsize(M) + M; + p=p, + stopping_criterion=StopAfterIteration(20), + stepsize=Manopt.ConstantStepsize(M), ) f(M, q) = distance(M, q, p) .^ 2 grad_f(M, q) = -2 * log(M, q, p) @@ -406,8 +425,8 @@ Manopt.get_manopt_parameter(d::TestDebugParameterState, ::Val{:value}) = d.value dD = DebugDivider(" | "; io=io) dA = DebugWhenActive(dD, false) @test !dA.active - set_manopt_parameter!(dA, :Dummy, true) # pass down - set_manopt_parameter!(dA, :Activity, true) # activate + set_parameter!(dA, :Dummy, true) # pass down + set_parameter!(dA, :Activity, true) # activate @test dA.active @test repr(dA) == "DebugWhenActive($(repr(dD)), true, true)" @test Manopt.status_summary(dA) == repr(dA) @@ -417,13 +436,13 @@ Manopt.get_manopt_parameter(d::TestDebugParameterState, ::Val{:value}) = d.value dE = DebugEvery(dA, 2) dE(mp, st, 2) @test endswith(String(take!(io)), " | ") - set_manopt_parameter!(dE, :Activity, false) # deactivate + set_parameter!(dE, :Activity, false) # deactivate dE(mp, st, -1) # test that reset is still working dE(mp, st, 2) @test endswith(String(take!(io)), "") @test !dA.active dG = DebugGroup([dA]) - set_manopt_parameter!(dG, :Activity, true) # activate in group + set_parameter!(dG, :Activity, true) # activate in group dG(mp, st, 2) @test endswith(String(take!(io)), " | ") # test its usage in the factory independent of position @@ -431,7 +450,7 @@ Manopt.get_manopt_parameter(d::TestDebugParameterState, ::Val{:value}) = d.value @test DebugFactory([:WhenActive, " | "])[:Iteration] isa DebugWhenActive dst = DebugSolverState(st, dA) - set_manopt_parameter!(dst, :Debug, :Activity, true) + set_parameter!(dst, :Debug, :Activity, true) @test dA.active end end diff --git a/test/plans/test_defaults_factory.jl b/test/plans/test_defaults_factory.jl new file mode 100644 index 0000000000..3270ddc047 --- /dev/null +++ b/test/plans/test_defaults_factory.jl @@ -0,0 +1,19 @@ +using Manopt, Manifolds, Test + +# A rule taht does not need a manifold but has defaults +struct FactoryDummyRule{R<:Real} + t::R + FactoryDummyRule(; t::R=1.0) where {R<:Real} = new{R}(t) +end + +@testset "ManifoldsDefaultFactory" begin + fdr = Manopt.ManifoldDefaultsFactory( + FactoryDummyRule, Sphere(2); requires_manifold=false, t=2.0 + ) + @test fdr().t == 2.0 + @test fdr(Euclidean(2)).t == 2.0 + @test startswith(repr(fdr), "ManifoldDefaultsFactory(FactoryDummyRule)") + # A case without a manifold and with keywords instead. + fdr2 = Manopt.ManifoldDefaultsFactory(FactoryDummyRule, 1) + @test startswith(repr(fdr2), "ManifoldDefaultsFactory(FactoryDummyRule)") +end diff --git a/test/plans/test_gradient_plan.jl b/test/plans/test_gradient_plan.jl index f3e9dd5b2f..d2336b7a9c 100644 --- a/test/plans/test_gradient_plan.jl +++ b/test/plans/test_gradient_plan.jl @@ -8,7 +8,10 @@ include("../utils/dummy_types.jl") M = ManifoldsBase.DefaultManifold(2) p = [4.0, 2.0] gst = GradientDescentState( - M, zero(p); stopping_criterion=StopAfterIteration(20), stepsize=ConstantStepsize(M) + M; + p=zero(p), + stopping_criterion=StopAfterIteration(20), + stepsize=Manopt.ConstantStepsize(M), ) set_iterate!(gst, M, p) @test get_iterate(gst) == p diff --git a/test/plans/test_higher_order_primal_dual_plan.jl b/test/plans/test_higher_order_primal_dual_plan.jl index 7702883f9d..9e3c415527 100644 --- a/test/plans/test_higher_order_primal_dual_plan.jl +++ b/test/plans/test_higher_order_primal_dual_plan.jl @@ -1,4 +1,4 @@ -using Manopt, Manifolds, ManifoldsBase, Test +using Manopt, Manifolds, ManifoldsBase, Test, RecursiveArrayTools using ManoptExamples: forward_logs, adjoint_differential_forward_logs using ManifoldDiff: differential_shortest_geodesic_startpoint, diff --git a/test/plans/test_interior_point_newton_plan.jl b/test/plans/test_interior_point_newton_plan.jl index 148d38c2fe..b4afe14916 100644 --- a/test/plans/test_interior_point_newton_plan.jl +++ b/test/plans/test_interior_point_newton_plan.jl @@ -1,4 +1,4 @@ -using ManifoldsBase, Manifolds, Manopt, Test +using ManifoldsBase, Manifolds, Manopt, Test, RecursiveArrayTools @testset "InteriorPointNewtonState" begin M = ManifoldsBase.DefaultManifold(3) @@ -61,7 +61,7 @@ using ManifoldsBase, Manifolds, Manopt, Test sub_state = ConjugateResidualState(TangentSpace(sub_M, sub_p), sub_obj) dmp = DefaultManoptProblem(M, coh) ipns = InteriorPointNewtonState( - M, coh, p, DefaultManoptProblem(sub_M, sub_obj), sub_state + M, coh, DefaultManoptProblem(sub_M, sub_obj), sub_state; p=p ) # Getters & Setters @test length(Manopt.get_message(ipns)) == 0 @@ -83,11 +83,11 @@ using ManifoldsBase, Manifolds, Manopt, Test @test length(get_reason(sc)) > 0 # ipcc = InteriorPointCentralityCondition(coh, 1.0) - @test Manopt.set_manopt_parameter!(ipcc, :τ, step_M, step_p) == ipcc - @test Manopt.set_manopt_parameter!(ipcc, :γ, 2.0) == ipcc - @test Manopt.get_manopt_parameter(ipcc, :γ) == 2.0 - @test Manopt.get_manopt_parameter(ipcc, :τ1) == 2 / 3 - @test Manopt.get_manopt_parameter(ipcc, :τ2) ≈ 0.2809757 atol = 1e-7 + @test Manopt.set_parameter!(ipcc, :τ, step_M, step_p) == ipcc + @test Manopt.set_parameter!(ipcc, :γ, 2.0) == ipcc + @test Manopt.get_parameter(ipcc, :γ) == 2.0 + @test Manopt.get_parameter(ipcc, :τ1) == 2 / 3 + @test Manopt.get_parameter(ipcc, :τ2) ≈ 0.2809757 atol = 1e-7 @test !ipcc(step_M, step_p) ipcc.τ1 = 0.01 # trick conditions so ipcc succeeds ipcc.τ2 = 0.01 diff --git a/test/plans/test_nelder_mead_plan.jl b/test/plans/test_nelder_mead_plan.jl index a8c5daa64d..6173a0d8b1 100644 --- a/test/plans/test_nelder_mead_plan.jl +++ b/test/plans/test_nelder_mead_plan.jl @@ -4,7 +4,7 @@ using Manifolds, Manopt, Test @testset "Nelder Mead State" begin M = Euclidean(2) o = NelderMeadState(M) - o2 = NelderMeadState(M, o.population) + o2 = NelderMeadState(M; population=o.population) @test o.p == o2.p @test o.population == o2.population @test get_state(o) == o diff --git a/test/plans/test_objective.jl b/test/plans/test_objective.jl index 804ead737a..ab97217adc 100644 --- a/test/plans/test_objective.jl +++ b/test/plans/test_objective.jl @@ -23,10 +23,10 @@ include("../utils/dummy_types.jl") r2 = Manopt.ReturnManifoldObjective(d) @test repr(r) == "ManifoldCostObjective{AllocatingEvaluation}" end - @testset "set_manopt_parameter!" begin + @testset "set_parameter!" begin o = ManifoldCostObjective(x -> x) mp = DefaultManoptProblem(ManifoldsBase.DefaultManifold(2), o) - set_manopt_parameter!(mp, :Objective, :Dummy, 1) + set_parameter!(mp, :Objective, :Dummy, 1) end @testset "functions" begin M = ManifoldsBase.DefaultManifold(2) diff --git a/test/plans/test_parameters.jl b/test/plans/test_parameters.jl index 6f1478d1db..b7e50a3682 100644 --- a/test/plans/test_parameters.jl +++ b/test/plans/test_parameters.jl @@ -2,14 +2,14 @@ using Manifolds, Manopt, Test, ManifoldsBase @testset "Generic Parameters" begin # test one internal fallback - Manopt.get_manopt_parameter(:None, Val(:default)) === nothing - @test_logs (:info, "Setting the `Manopt.jl` parameter :TestValue to Å.") Manopt.set_manopt_parameter!( + Manopt.get_parameter(:None, Val(:default)) === nothing + @test_logs (:info, "Setting the `Manopt.jl` parameter :TestValue to Å.") Manopt.set_parameter!( :TestValue, "Å" ) - @test Manopt.get_manopt_parameter(:TestValue) == "Å" - @test Manopt.get_manopt_parameter(:TestValue, :Dummy) == "Å" # Dispatch ignores second symbol - @test_logs (:info, "Resetting the `Manopt.jl` parameter :TestValue to default.") Manopt.set_manopt_parameter!( + @test Manopt.get_parameter(:TestValue) == "Å" + @test Manopt.get_parameter(:TestValue, :Dummy) == "Å" # Dispatch ignores second symbol + @test_logs (:info, "Resetting the `Manopt.jl` parameter :TestValue to default.") Manopt.set_parameter!( :TestValue, "" ) # reset - @test Manopt.get_manopt_parameter(:TestValue) === nothing + @test Manopt.get_parameter(:TestValue) === nothing end diff --git a/test/plans/test_primal_dual_plan.jl b/test/plans/test_primal_dual_plan.jl index feeaa037c4..c28228a69a 100644 --- a/test/plans/test_primal_dual_plan.jl +++ b/test/plans/test_primal_dual_plan.jl @@ -1,4 +1,5 @@ using Manopt, Manifolds, ManifoldsBase, ManifoldDiff, ManoptExamples, Test +using RecursiveArrayTools using ManoptExamples: forward_logs, @@ -78,8 +79,8 @@ include("../utils/dummy_types.jl") f, prox_f, prox_g_dual, adjoint_DΛ; linearized_forward_operator=DΛ ) p_linearized = TwoManifoldProblem(M, N, pdmol) - s_exact = ChambollePockState(M, m, n, zero.(p0), X0; variant=:exact) - s_linearized = ChambollePockState(M, m, n, p0, X0; variant=:linearized) + s_exact = ChambollePockState(M; m=m, n=n, p=zero.(p0), X=X0, variant=:exact) + s_linearized = ChambollePockState(M; m=m, n=n, p=p0, X=X0, variant=:linearized) n_old = ArrayPartition(n[N, :point], n[N, :vector]) p_old = copy(p0) ξ_old = ArrayPartition(X0[N, :point], X0[N, :vector]) @@ -88,11 +89,11 @@ include("../utils/dummy_types.jl") @test all(get_iterate(s_exact) .== p0) osm = PrimalDualSemismoothNewtonState( - M, - m, - n, - zero.(p0), - X0; + M; + m=m, + n=n, + p=zero.(p0), + X=X0, primal_stepsize=0.0, dual_stepsize=0.0, regularization_parameter=0.0, @@ -168,8 +169,8 @@ include("../utils/dummy_types.jl") f, prox_f, prox_g_dual, adjoint_DΛ; linearized_forward_operator=DΛ ) p_linearized = TwoManifoldProblem(M, N, pmdol) - s_exact = ChambollePockState(M, m, n, p0, X0; variant=:exact) - s_linearized = ChambollePockState(M, m, n, p0, X0; variant=:linearized) + s_exact = ChambollePockState(M; m=m, n=n, p=p0, X=X0, variant=:exact) + s_linearized = ChambollePockState(M; m=m, n=n, p=p0, X=X0, variant=:linearized) @test primal_residual(p_exact, s_exact, p_old, ξ_old, n_old) ≈ 0 atol = 1e-16 @test primal_residual(p_linearized, s_linearized, p_old, ξ_old, n_old) ≈ 0 atol = 1e-16 @@ -184,7 +185,7 @@ include("../utils/dummy_types.jl") @test dual_residual(p_exact, s_exact, p_old, ξ_old, n_old) > 4.0 @test dual_residual(p_linearized, s_linearized, p_old, ξ_old, n_old) > 0 - o_err = ChambollePockState(M, m, n, p0, X0; variant=:err) + o_err = ChambollePockState(M; m=m, n=n, p=p0, X=X0, variant=:err) @test_throws DomainError dual_residual(p_exact, o_err, p_old, ξ_old, n_old) end @testset "Debug prints" begin diff --git a/test/plans/test_problem.jl b/test/plans/test_problem.jl index 15933a296c..fec08edc16 100644 --- a/test/plans/test_problem.jl +++ b/test/plans/test_problem.jl @@ -14,14 +14,14 @@ using Manopt, Manifolds, Test cpi = DefaultManoptProblem(M, moi) @test Manopt.evaluation_type(cpi) === InplaceEvaluation end - @testset "set_manopt_parameter functions" begin + @testset "set_parameter functions" begin f(M, p) = 1 # dummy cost mco = ManifoldCostObjective(f) dmp = DefaultManoptProblem(Euclidean(3), mco) # has no effect but does not error - set_manopt_parameter!(f, :Dummy, 1) - set_manopt_parameter!(dmp, :Cost, :Dummy, 1) - set_manopt_parameter!(mco, :Cost, :Dummy, 1) + set_parameter!(f, :Dummy, 1) + set_parameter!(dmp, :Cost, :Dummy, 1) + set_parameter!(mco, :Cost, :Dummy, 1) # but the objective here does not have a gradient end end diff --git a/test/plans/test_record.jl b/test/plans/test_record.jl index 28bdb72dba..c947a13d8a 100644 --- a/test/plans/test_record.jl +++ b/test/plans/test_record.jl @@ -3,10 +3,10 @@ using Manopt: RecordFactory, RecordGroupFactory, RecordActionFactory mutable struct TestRecordParameterState <: AbstractManoptSolverState value::Int end -function Manopt.set_manopt_parameter!(d::TestRecordParameterState, ::Val{:value}, v) +function Manopt.set_parameter!(d::TestRecordParameterState, ::Val{:value}, v) (d.value = v; return d) end -Manopt.get_manopt_parameter(d::TestRecordParameterState, ::Val{:value}) = d.value +Manopt.get_parameter(d::TestRecordParameterState, ::Val{:value}) = d.value @testset "Record State" begin # helper to get debug as string @@ -14,7 +14,10 @@ Manopt.get_manopt_parameter(d::TestRecordParameterState, ::Val{:value}) = d.valu M = ManifoldsBase.DefaultManifold(2) p = [4.0, 2.0] gds = GradientDescentState( - M, copy(p); stopping_criterion=StopAfterIteration(20), stepsize=ConstantStepsize(M) + M; + p=copy(p), + stopping_criterion=StopAfterIteration(10), + stepsize=Manopt.ConstantStepsize(M), ) f(M, q) = distance(M, q, p) .^ 2 grad_f(M, q) = -2 * log(M, q, p) @@ -24,11 +27,13 @@ Manopt.get_manopt_parameter(d::TestRecordParameterState, ::Val{:value}) = d.valu @test Manopt.status_summary(a) == ":Iteration" # constructors rs = RecordSolverState(gds, a) - Manopt.set_manopt_parameter!(rs, :Record, RecordCost()) + Manopt.set_parameter!(rs, :Record, RecordCost()) @test Manopt.dispatch_state_decorator(rs) === Val{true}() @test get_state(gds) == gds @test get_state(rs) == gds @test_throws MethodError get_state(dmp) + Manopt.set_parameter!(rs, :StoppingCriterion, :MaxIteration, 20) + @test rs.state.stop.max_iterations == 20 #Maybe turn into a getter? # @test get_initial_stepsize(dmp, rs) == 1.0 @test get_stepsize(dmp, rs, 1) == 1.0 @@ -194,7 +199,7 @@ Manopt.get_manopt_parameter(d::TestRecordParameterState, ::Val{:value}) = d.valu rss = RecordSubsolver() @test repr(rss) == "RecordSubsolver(; record=[:Iteration], record_type=Any)" @test Manopt.status_summary(rss) == ":Subsolver" - epms = ExactPenaltyMethodState(M, rand(M), dmp, rs) + epms = ExactPenaltyMethodState(M, dmp, rs) rss(dmp, epms, 1) end @testset "RecordWhenActive" begin @@ -207,9 +212,9 @@ Manopt.get_manopt_parameter(d::TestRecordParameterState, ::Val{:value}) = d.valu rwa(dmp, gds, -1) # Reset @test length(get_record(rwa)) == 0 rwa(dmp, gds, 1) - set_manopt_parameter!(rwa, :Activity, false) + set_parameter!(rwa, :Activity, false) # passthrough to inner - set_manopt_parameter!(rwa, :test, 1) + set_parameter!(rwa, :test, 1) @test !rwa.active # test inactive rwa(dmp, gds, 2) @@ -304,7 +309,7 @@ Manopt.get_manopt_parameter(d::TestRecordParameterState, ::Val{:value}) = d.valu @testset "Record and parameter passthrough" begin s = TestRecordParameterState(0) r = RecordSolverState(s, RecordIteration()) - Manopt.set_manopt_parameter!(r, :value, 1) - @test Manopt.get_manopt_parameter(r, :value) == 1 + Manopt.set_parameter!(r, :value, 1) + @test Manopt.get_parameter(r, :value) == 1 end end diff --git a/test/plans/test_state.jl b/test/plans/test_state.jl index fc77d189f0..5a7f9cbfb7 100644 --- a/test/plans/test_state.jl +++ b/test/plans/test_state.jl @@ -12,15 +12,16 @@ struct NoIterateState <: AbstractManoptSolverState end s = DummyState() @test repr(Manopt.ReturnSolverState(s)) == "ReturnSolverState($s)" @test Manopt.status_summary(Manopt.ReturnSolverState(s)) == "DummyState(Float64[])" - a = ArmijoLinesearch(M; initial_stepsize=1.0) + a = ArmijoLinesearch(; initial_stepsize=1.0)(M) @test get_last_stepsize(a) == 1.0 @test get_initial_stepsize(a) == 1.0 - set_manopt_parameter!(s, :Dummy, 1) + set_parameter!(s, :Dummy, 1) end @testset "Decreasing Stepsize" begin - dec_step = DecreasingStepsize(; - length=10.0, factor=1.0, subtrahend=0.0, exponent=1.0 + M = Euclidean(3) + dec_step = DecreasingLength(; length=10.0, factor=1.0, subtrahend=0.0, exponent=1.0)( + M ) @test get_initial_stepsize(dec_step) == 10.0 M = Euclidean(3) @@ -71,6 +72,7 @@ struct NoIterateState <: AbstractManoptSolverState end s2 = NoIterateState() @test_throws ErrorException get_iterate(s2) end + @testset "Iteration and Gradient setters" begin M = Euclidean(3) s1 = NelderMeadState(M) @@ -89,10 +91,12 @@ struct NoIterateState <: AbstractManoptSolverState end @test d2.state.X == ones(3) @test get_stopping_criterion(d2) === s2.stop end + @testset "Closed Form State" begin @test Manopt.ClosedFormSubSolverState() isa Manopt.ClosedFormSubSolverState{AllocatingEvaluation} end + @testset "Generic Objective and State solver returns" begin f(M, p) = 1 o = ManifoldCostObjective(f) diff --git a/test/plans/test_stepsize.jl b/test/plans/test_stepsize.jl index 66ff5df647..4fb25bb90d 100644 --- a/test/plans/test_stepsize.jl +++ b/test/plans/test_stepsize.jl @@ -1,37 +1,35 @@ -using Manopt, Manifolds, Test +using ManifoldsBase, Manopt, Manifolds, Test @testset "Stepsize" begin - @test Manopt.get_message(ConstantStepsize(1.0)) == "" - s = ArmijoLinesearch() - @test startswith(repr(s), "ArmijoLinesearch() with keyword parameters\n") + M = ManifoldsBase.DefaultManifold(2) + @test Manopt.get_message(Manopt.ConstantStepsize(M, 1.0)) == "" + s = Manopt.ArmijoLinesearchStepsize(Euclidean()) + @test startswith(repr(s), "ArmijoLinesearch(;") s_stat = Manopt.status_summary(s) - @test startswith(s_stat, "ArmijoLinesearch() with keyword parameters\n") + @test startswith(s_stat, "ArmijoLinesearch(;") @test endswith(s_stat, "of 1.0") @test Manopt.get_message(s) == "" - s2 = NonmonotoneLinesearch() - @test startswith(repr(s2), "NonmonotoneLinesearch() with keyword arguments\n") + s2 = NonmonotoneLinesearch()(M) + @test startswith(repr(s2), "NonmonotoneLinesearch(;") @test Manopt.get_message(s2) == "" - s2b = NonmonotoneLinesearch(Euclidean(2)) # with manifold -> faster storage - @test startswith(repr(s2b), "NonmonotoneLinesearch() with keyword arguments\n") - - s3 = WolfePowellBinaryLinesearch() + s3 = WolfePowellBinaryLinesearch()(M) @test Manopt.get_message(s3) == "" - @test startswith(repr(s3), "WolfePowellBinaryLinesearch(DefaultManifold(), ") + @test startswith(repr(s3), "WolfePowellBinaryLinesearch(;") # no stepsize yet so `repr` and summary are the same @test repr(s3) == Manopt.status_summary(s3) - s4 = WolfePowellLinesearch() - @test startswith(repr(s4), "WolfePowellLinesearch(DefaultManifold(), ") + s4 = WolfePowellLinesearch()(M) + @test startswith(repr(s4), "WolfePowellLinesearch(;") # no stepsize yet so `repr` and summary are the same @test repr(s4) == Manopt.status_summary(s4) @test Manopt.get_message(s4) == "" @testset "Armijo setter / getters" begin # Check that the passdowns work, though; since the defaults are functions, they return nothing - @test isnothing(Manopt.get_manopt_parameter(s, :IncreaseCondition, :Dummy)) - @test isnothing(Manopt.get_manopt_parameter(s, :DecreaseCondition, :Dummy)) - @test Manopt.set_manopt_parameter!(s, :IncreaseCondition, :Dummy, 1) == s #setters return s - @test Manopt.set_manopt_parameter!(s, :DecreaseCondition, :Dummy, 1) == s + @test isnothing(Manopt.get_parameter(s, :IncreaseCondition, :Dummy)) + @test isnothing(Manopt.get_parameter(s, :DecreaseCondition, :Dummy)) + @test Manopt.set_parameter!(s, :IncreaseCondition, :Dummy, 1) == s #setters return s + @test Manopt.set_parameter!(s, :DecreaseCondition, :Dummy, 1) == s end @testset "Linesearch safeguards" begin M = Euclidean(2) @@ -76,8 +74,8 @@ using Manopt, Manifolds, Test p = [1.0, 0.0, 0.0] mgo = ManifoldGradientObjective(f, grad_f) mp = DefaultManoptProblem(M, mgo) - s = AdaptiveWNGradient(; gradient_reduction=0.5, count_threshold=2) - gds = GradientDescentState(M, p) + s = AdaptiveWNGradient(; gradient_reduction=0.5, count_threshold=2)(M) + gds = GradientDescentState(M; p=p) @test get_initial_stepsize(s) == 1.0 @test get_last_stepsize(s) == 1.0 @test s(mp, gds, 0) == 1.0 @@ -94,6 +92,7 @@ using Manopt, Manifolds, Test @test startswith(repr(s), "AdaptiveWNGradient(;\n ") end @testset "Absolute stepsizes" begin + M = ManifoldsBase.DefaultManifold(2) # Build a dummy function and gradient f(M, p) = 0 grad_f(M, p) = [0.0, 0.75, 0.0] # valid, since only north pole used @@ -101,14 +100,14 @@ using Manopt, Manifolds, Test p = [1.0, 0.0, 0.0] mgo = ManifoldGradientObjective(f, grad_f) mp = DefaultManoptProblem(M, mgo) - gds = GradientDescentState(M, p) - abs_dec_step = DecreasingStepsize(; - length=10.0, factor=1.0, subtrahend=0.0, exponent=1.0, type=:absolute + gds = GradientDescentState(M; p=p) + abs_dec_step = Manopt.DecreasingStepsize( + M; length=10.0, factor=1.0, subtrahend=0.0, exponent=1.0, type=:absolute ) solve!(mp, gds) @test abs_dec_step(mp, gds, 1) == 10.0 / norm(get_manifold(mp), get_iterate(gds), get_gradient(gds)) - abs_const_step = ConstantStepsize(1.0, :absolute) + abs_const_step = Manopt.ConstantStepsize(M, 1.0; type=:absolute) @test abs_const_step(mp, gds, 1) == 1.0 / norm(get_manifold(mp), get_iterate(gds), get_gradient(gds)) end @@ -119,10 +118,10 @@ using Manopt, Manifolds, Test dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f, grad_f)) p = [2.0, 2.0] X = grad_f(M, p) - sgs = SubGradientMethodState(M, p) - ps = PolyakStepsize() + sgs = SubGradientMethodState(M; p=p) + ps = Polyak()() @test repr(ps) == - "PolyakStepsize() with keyword parameters\n * initial_cost_estimate = 0.0" + "Polyak()\nA stepsize with keyword parameters\n * initial_cost_estimate = 0.0\n" @test ps(dmp, sgs, 1) == (f(M, p) - 0 + 1) / (norm(M, p, X)^2) end end diff --git a/test/plans/test_stopping_criteria.jl b/test/plans/test_stopping_criteria.jl index 156a24f212..f98822fb67 100644 --- a/test/plans/test_stopping_criteria.jl +++ b/test/plans/test_stopping_criteria.jl @@ -9,7 +9,7 @@ struct DummyStoppingCriterion <: StoppingCriterion end @testset "Generic Tests" begin @test_throws ErrorException get_stopping_criteria(myStoppingCriteriaSet()) - s = StopWhenAll(StopAfterIteration(10), StopWhenChangeLess(0.1)) + s = StopWhenAll(StopAfterIteration(10), StopWhenChangeLess(Euclidean(), 0.1)) @test Manopt.indicates_convergence(s) #due to all and change this is true @test startswith(repr(s), "StopWhenAll with the") @test get_reason(s) === "" @@ -17,7 +17,7 @@ struct DummyStoppingCriterion <: StoppingCriterion end s.criteria[2].last_change = 0.05 s.criteria[2].at_iteration = 3 @test length(get_reason(s.criteria[2])) > 0 - s2 = StopWhenAll([StopAfterIteration(10), StopWhenChangeLess(0.1)]) + s2 = StopWhenAll([StopAfterIteration(10), StopWhenChangeLess(Euclidean(), 0.1)]) @test get_stopping_criteria(s)[1].max_iterations == get_stopping_criteria(s2)[1].max_iterations @@ -25,7 +25,7 @@ struct DummyStoppingCriterion <: StoppingCriterion end p = DefaultManoptProblem( Euclidean(), ManifoldGradientObjective((M, x) -> x^2, x -> 2x) ) - s = GradientDescentState(Euclidean(), 1.0) + s = GradientDescentState(Euclidean(); p=1.0) @test !s3(p, s, 1) @test length(get_reason(s3)) == 0 s.p = 0.3 @@ -57,7 +57,7 @@ struct DummyStoppingCriterion <: StoppingCriterion end an = get_reason(sm) m = match(r"^((.*)\n)+", an) @test length(m.captures) == 2 # both have to be active - update_stopping_criterion!(s3, :MinCost, 1e-2) + set_parameter!(s3, :MinCost, 1e-2) @test s3.threshold == 1e-2 # Dummy without iterations has a reasonable fallback @test Manopt.get_count(DummyStoppingCriterion(), Val(:Iterations)) == 0 @@ -80,15 +80,15 @@ struct DummyStoppingCriterion <: StoppingCriterion end @test s(p, o, 2) == true @test length(get_reason(s)) > 0 @test_throws ErrorException StopAfter(Second(-1)) - @test_throws ErrorException update_stopping_criterion!(s, :MaxTime, Second(-1)) - update_stopping_criterion!(s, :MaxTime, Second(2)) + @test_throws ErrorException set_parameter!(s, :MaxTime, Second(-1)) + set_parameter!(s, :MaxTime, Second(2)) @test s.threshold == Second(2) end @testset "Stopping Criterion &/| operators" begin a = StopAfterIteration(200) - b = StopWhenChangeLess(1e-6) - sb = "StopWhenChangeLess(1.0e-6)\n $(Manopt.status_summary(b))" + b = StopWhenChangeLess(Euclidean(), 1e-6) + sb = "StopWhenChangeLess with threshold 1.0e-6\n $(Manopt.status_summary(b))" @test repr(b) == sb @test get_reason(b) == "" b2 = StopWhenChangeLess(Euclidean(), 1e-6) # second constructor @@ -108,14 +108,14 @@ struct DummyStoppingCriterion <: StoppingCriterion end @test typeof(d) === typeof(a & b & c) @test typeof(d) === typeof(a & (b & c)) @test typeof(d) === typeof((a & b) & c) - update_stopping_criterion!(d, :MinIterateChange, 1e-8) + set_parameter!(d, :MinIterateChange, 1e-8) @test d.criteria[2].threshold == 1e-8 @test length((d & d).criteria) == 6 e = a | b | c @test typeof(e) === typeof(a | b | c) @test typeof(e) === typeof(a | (b | c)) @test typeof(e) === typeof((a | b) | c) - update_stopping_criterion!(e, :MinGradNorm, 1e-9) + set_parameter!(e, :MinGradNorm, 1e-9) @test e.criteria[3].threshold == 1e-9 @test length((e | e).criteria) == 6 end @@ -158,7 +158,7 @@ struct DummyStoppingCriterion <: StoppingCriterion end @test endswith(Manopt.status_summary(swgcl), "reached") @test length(get_reason(swgcl)) > 0 end - update_stopping_criterion!(swgcl2, :MinGradientChange, 1e-9) + set_parameter!(swgcl2, :MinGradientChange, 1e-9) @test swgcl2.threshold == 1e-9 end @@ -167,7 +167,7 @@ struct DummyStoppingCriterion <: StoppingCriterion end ho = ManifoldHessianObjective(x -> x, (M, x) -> x, (M, x) -> x, x -> x) hp = DefaultManoptProblem(Euclidean(), ho) tcgs = TruncatedConjugateGradientState( - TangentSpace(Euclidean(), 1.0), 0.0; trust_region_radius=2.0, randomize=false + TangentSpace(Euclidean(), 1.0); X=0.0, trust_region_radius=2.0, randomize=false ) tcgs.model_value = 1.0 s = StopWhenModelIncreased() @@ -183,9 +183,9 @@ struct DummyStoppingCriterion <: StoppingCriterion end @test s2(hp, tcgs, 1) @test length(get_reason(s2)) > 0 s3 = StopWhenResidualIsReducedByFactorOrPower() - update_stopping_criterion!(s3, :ResidualFactor, 0.5) + set_parameter!(s3, :ResidualFactor, 0.5) @test s3.κ == 0.5 - update_stopping_criterion!(s3, :ResidualPower, 0.5) + set_parameter!(s3, :ResidualPower, 0.5) @test s3.θ == 0.5 @test get_reason(s3) == "" # Trigger manually @@ -197,20 +197,20 @@ struct DummyStoppingCriterion <: StoppingCriterion end mgo = ManifoldGradientObjective((M, x) -> x^2, x -> 2x) dmp = DefaultManoptProblem(Euclidean(), mgo) gds = GradientDescentState( - Euclidean(), - 1.0; + Euclidean(); + p=1.0, stopping_criterion=StopAfterIteration(100), - stepsize=ConstantStepsize(Euclidean()), + stepsize=Manopt.ConstantStepsize(Euclidean()), ) s1 = StopWhenStepsizeLess(0.5) @test !s1(dmp, gds, 1) @test length(get_reason(s1)) == 0 - gds.stepsize = ConstantStepsize(; stepsize=0.25) + gds.stepsize = Manopt.ConstantStepsize(Euclidean(), 0.25) @test s1(dmp, gds, 2) @test length(get_reason(s1)) > 0 - update_stopping_criterion!(gds, :MaxIteration, 200) + set_parameter!(gds, :StoppingCriterion, :MaxIteration, 200) @test gds.stop.max_iterations == 200 - update_stopping_criterion!(s1, :MinStepsize, 1e-1) + set_parameter!(s1, :MinStepsize, 1e-1) @test s1.threshold == 1e-1 end @@ -218,14 +218,14 @@ struct DummyStoppingCriterion <: StoppingCriterion end mgo = ManifoldGradientObjective((M, x) -> x^2, x -> 2x) dmp = DefaultManoptProblem(Euclidean(), mgo) gds = GradientDescentState( - Euclidean(), - 1.0; + Euclidean(); + p=1.0, stopping_criterion=StopAfterIteration(100), - stepsize=ConstantStepsize(Euclidean()), + stepsize=Manopt.ConstantStepsize(Euclidean()), ) swecl = StopWhenEntryChangeLess(:p, (p, s, v, w) -> norm(w - v), 1e-5) @test startswith(repr(swecl), "StopWhenEntryChangeLess\n") - update_stopping_criterion!(swecl, :Threshold, 1e-4) + set_parameter!(swecl, :Threshold, 1e-4) @test swecl.threshold == 1e-4 @test !swecl(dmp, gds, 1) #First call stores @test length(get_reason(swecl)) == 0 @@ -249,7 +249,7 @@ struct DummyStoppingCriterion <: StoppingCriterion end c2 = StopWhenSubgradientNormLess(1e-6) sc2 = "StopWhenSubgradientNormLess(1.0e-6)\n $(Manopt.status_summary(c2))" @test repr(c2) == sc2 - st = SubGradientMethodState(M, p; stopping_criterion=c2) + st = SubGradientMethodState(M; p=p, stopping_criterion=c2) st.X = ∂f(M, 2p) @test !c2(mp, st, 1) st.X = ∂f(M, p) @@ -258,7 +258,7 @@ struct DummyStoppingCriterion <: StoppingCriterion end c2(mp, st, 0) # verify that reset works @test length(get_reason(c2)) == 0 @test Manopt.indicates_convergence(c2) - update_stopping_criterion!(c2, :MinSubgradNorm, 1e-8) + set_parameter!(c2, :MinSubgradNorm, 1e-8) @test c2.threshold == 1e-8 end diff --git a/test/plans/test_storage.jl b/test/plans/test_storage.jl index ebc495757b..7ac9787b69 100644 --- a/test/plans/test_storage.jl +++ b/test/plans/test_storage.jl @@ -3,15 +3,17 @@ using Test, Manopt, ManifoldsBase, Manifolds @testset "StoreStateAction" begin @testset "manifold $M" for M in [ManifoldsBase.DefaultManifold(2), Circle()] if M isa Circle - p = 0.4 - X_zero = 0.0 + p = [0.4] else p = [4.0, 2.0] - X_zero = [0.0, 0.0] end + X_zero = zero_vector(M, p) st = GradientDescentState( - M, p; stopping_criterion=StopAfterIteration(20), stepsize=ConstantStepsize(M) + M; + p=p, + stopping_criterion=StopAfterIteration(20), + stepsize=Manopt.ConstantStepsize(M), ) f(M, q) = distance(M, q, p) .^ 2 grad_f(M, q) = -2 * log(M, q, p) @@ -65,4 +67,28 @@ using Test, Manopt, ManifoldsBase, Manifolds end @test Manopt.extract_type_from_namedtuple(typeof((; a=10, b='a')), Val(:c)) === Any + + @testset "StorageRef for numbers" begin + # Since we wrap now earlier, these might indeed no longer be actually used + M = ManifoldsBase.DefaultManifold() + q = Manopt._storage_copy_point(M, 1.0) + copyto!(q, 2.0) + @test q.x == 2.0 + Y = Manopt._storage_copy_vector(M, 3.0) + st = Manopt.StoreStateAction( + M; p_init=1.0, X_init=2.0, store_points=Tuple{:p}, store_vectors=Tuple{:X} + ) + Manopt.get_storage(st, Manopt.VectorStorageKey(:X)) == 2.0 + Manopt.get_storage(st, Manopt.PointStorageKey(:p)) == 1.0 + end + @testset "Store swarm" begin + M = ManifoldsBase.DefaultManifold(2) + A = [[1.0, 2.0], [3.0, 4.0]] + pss = ParticleSwarmState(M, A, [[5.0, 6.0], [7.0, 8.0]]) + a = StoreStateAction(M; store_fields=[:Population]) + f(M, q) = distance(M, q, [1.2, 1.3]) .^ 2 + mp = DefaultManoptProblem(M, ManifoldCostObjective(f)) + update_storage!(a, mp, pss) + @test get_storage(a, :Population) == A + end end diff --git a/test/runtests.jl b/test/runtests.jl index da48b4ce85..e254f85b13 100644 --- a/test/runtests.jl +++ b/test/runtests.jl @@ -23,6 +23,7 @@ include("utils/example_tasks.jl") include("plans/test_primal_dual_plan.jl") include("plans/test_proximal_plan.jl") include("plans/test_higher_order_primal_dual_plan.jl") + include("plans/test_defaults_factory.jl") include("plans/test_record.jl") include("plans/test_stepsize.jl") include("plans/test_stochastic_gradient_plan.jl") diff --git a/test/solvers/test_ChambollePock.jl b/test/solvers/test_ChambollePock.jl index 5c9c4cd976..6b0ac2070e 100644 --- a/test/solvers/test_ChambollePock.jl +++ b/test/solvers/test_ChambollePock.jl @@ -1,4 +1,4 @@ -using Manopt, Manifolds, ManifoldsBase, Test +using Manopt, Manifolds, ManifoldsBase, Test, RecursiveArrayTools using ManoptExamples: forward_logs, differential_forward_logs, adjoint_differential_forward_logs using ManifoldDiff: prox_distance, prox_distance! diff --git a/test/solvers/test_Douglas_Rachford.jl b/test/solvers/test_Douglas_Rachford.jl index a1c25451bb..38034676d4 100644 --- a/test/solvers/test_Douglas_Rachford.jl +++ b/test/solvers/test_Douglas_Rachford.jl @@ -1,5 +1,5 @@ using Manifolds, Manopt, Test -using ManifoldDiff: prox_distance +using ManifoldDiff: prox_distance, prox_distance! @testset "DouglasRachford" begin # Though this seems a strange way, it is a way to compute the mid point @@ -38,7 +38,7 @@ using ManifoldDiff: prox_distance @test isapprox(M, q4, p_star_2; atol=1e-14) #test getter/set - s = DouglasRachfordState(M, d1) + s = DouglasRachfordState(M; p=d1) sr = "# Solver state for `Manopt.jl`s Douglas Rachford Algorithm\n" @test startswith(repr(s), sr) set_iterate!(s, d2) diff --git a/test/solvers/test_Frank_Wolfe.jl b/test/solvers/test_Frank_Wolfe.jl index a1af11ebea..3e2ac47524 100644 --- a/test/solvers/test_Frank_Wolfe.jl +++ b/test/solvers/test_Frank_Wolfe.jl @@ -32,14 +32,14 @@ using ManifoldsBase, Manopt, Random, Test, LinearAlgebra Y = similar(X) FG(M, Y, p) @test FG(M, p) == Y - s = FrankWolfeState(M, p, oracle!, InplaceEvaluation()) + s = FrankWolfeState(M, oracle!; evaluation=InplaceEvaluation(), p=p) @test Manopt.get_message(s) == "" @test startswith(repr(s), "# Solver state for `Manopt.jl`s Frank Wolfe Method\n") set_iterate!(s, 2 .* p) @test get_iterate(s) == 2 .* p dmp = DefaultManoptProblem(M, ManifoldGradientObjective(FC, FG)) gds = GradientDescentState(M) - s2 = FrankWolfeState(M, p, dmp, gds) + s2 = FrankWolfeState(M, dmp, gds; p=p) @test Manopt.get_message(s2) == "" end @testset "Two small Test runs" begin diff --git a/test/solvers/test_Levenberg_Marquardt.jl b/test/solvers/test_Levenberg_Marquardt.jl index 85b8ec6367..6c9e5c3b1b 100644 --- a/test/solvers/test_Levenberg_Marquardt.jl +++ b/test/solvers/test_Levenberg_Marquardt.jl @@ -200,9 +200,9 @@ end x0 = [4.0, 2.0] o_r2 = LevenbergMarquardtState( M, - x0, similar(x0, length(ts_r2)), similar(x0, 2 * length(ts_r2), 2); + p=x0, stopping_criterion=StopAfterIteration(20), ) p_r2 = DefaultManoptProblem( @@ -234,27 +234,27 @@ end @testset "errors" begin @test_throws ArgumentError LevenbergMarquardtState( M, - x0, similar(x0, length(ts_r2)), similar(x0, 2 * length(ts_r2), 2); + p=x0, stopping_criterion=StopAfterIteration(20), η=2, ) @test_throws ArgumentError LevenbergMarquardtState( M, - x0, similar(x0, length(ts_r2)), similar(x0, 2 * length(ts_r2), 2); + p=x0, stopping_criterion=StopAfterIteration(20), damping_term_min=-1, ) @test_throws ArgumentError LevenbergMarquardtState( M, - x0, similar(x0, length(ts_r2)), similar(x0, 2 * length(ts_r2), 2); + p=x0, stopping_criterion=StopAfterIteration(20), β=0.5, ) diff --git a/test/solvers/test_adaptive_regularization_with_cubics.jl b/test/solvers/test_adaptive_regularization_with_cubics.jl index 2ad5862532..65838aea0e 100644 --- a/test/solvers/test_adaptive_regularization_with_cubics.jl +++ b/test/solvers/test_adaptive_regularization_with_cubics.jl @@ -29,7 +29,7 @@ include("../utils/example_tasks.jl") Manopt.get_objective_gradient!(M, X0, arcmo, p0) isapprox(M, p0, X0, get_gradient(M, mho, p0)) - g = get_gradient_function(arcmo) + g = Manopt.get_gradient_function(arcmo) isapprox(M, p0, g(M2, p0), get_gradient(M, mho, p0)) X0 = zero_vector(M, p0) X1 = similar(X0) @@ -38,14 +38,11 @@ include("../utils/example_tasks.jl") end @testset "State and repr" begin - # neither provided a problem nor an objective yields an error - @test_throws ErrorException AdaptiveRegularizationState(ℝ^2) - arcs = AdaptiveRegularizationState( M, - p0; - sub_problem=DefaultManoptProblem(M2, arcmo), - sub_state=GradientDescentState(M2, zero_vector(M, p0)), + DefaultManoptProblem(M2, arcmo), + GradientDescentState(M2; p=zero_vector(M, p0)); + p=p0, ) @test startswith( repr(arcs), @@ -59,9 +56,9 @@ include("../utils/example_tasks.jl") @test arcs.X == X1 arcs2 = AdaptiveRegularizationState( M, - p0; - sub_objective=arcmo, - sub_state=LanczosState(M2; maxIterLanczos=1), + DefaultManoptProblem(M2, arcmo), + LanczosState(M2; maxIterLanczos=1); + p=p0, stopping_criterion=StopWhenAllLanczosVectorsUsed(1), ) #add a fake Lanczos @@ -71,7 +68,7 @@ include("../utils/example_tasks.jl") @test stop_solver!(arcs2.sub_problem, arcs2, 1) arcs3 = AdaptiveRegularizationState( - M, p0; sub_objective=arcmo, sub_state=LanczosState(M2; maxIterLanczos=2) + M, DefaultManoptProblem(M2, arcmo), LanczosState(M2; maxIterLanczos=2); p=p0 ) #add a fake Lanczos initialize_solver!(arcs3.sub_problem, arcs3.sub_state) @@ -90,7 +87,7 @@ include("../utils/example_tasks.jl") ) # a second that copies arcs4 = AdaptiveRegularizationState( - M, p0; sub_objective=arcmo, sub_state=LanczosState(M2; maxIterLanczos=2) + M, DefaultManoptProblem(M2, arcmo), LanczosState(M2; maxIterLanczos=2); p=p0 ) #add a fake Lanczos push!(arcs4.sub_state.Lanczos_vectors, copy(M, p1, X1)) @@ -142,7 +139,7 @@ include("../utils/example_tasks.jl") ) @test r == p0 # Dummy construction with a function for the `sub_problem` - arcs4 = AdaptiveRegularizationState(M, p0; sub_problem=f1) + arcs4 = AdaptiveRegularizationState(M, f1; p=p0) @test arcs4.sub_state isa Manopt.ClosedFormSubSolverState end @@ -203,8 +200,8 @@ include("../utils/example_tasks.jl") sub_problem = DefaultManoptProblem(M2, arcmo) sub_state = GradientDescentState( - M2, - zero_vector(M, p0); + M2; + p=zero_vector(M, p0), stopping_criterion=StopAfterIteration(500) | StopWhenGradientNormLess(1e-11) | StopWhenFirstOrderProgress(0.1), @@ -243,11 +240,13 @@ include("../utils/example_tasks.jl") @test fc(Mc, p0) > fc(Mc, p2) end - @testset "Start at a point with _exactly_ gradient zero" begin + @testset "Start at a point with _exactly_ gradient zero - In Tutorial mode" begin p0 = zeros(2) M = Euclidean(2) + @test_logs (:info,) Manopt.set_parameter!(:Mode, "Tutorial") f2(M, p) = 0 grad_f2(M, p) = [0.0, 0.0] @test adaptive_regularization_with_cubics(M, f2, grad_f2, p0) == p0 + @test_logs (:info,) Manopt.set_parameter!(:Mode, "") end end diff --git a/test/solvers/test_alternating_gradient.jl b/test/solvers/test_alternating_gradient.jl index f7e235e731..e6d8f02aa0 100644 --- a/test/solvers/test_alternating_gradient.jl +++ b/test/solvers/test_alternating_gradient.jl @@ -1,4 +1,5 @@ -using Manopt, Manifolds, Test +using Manopt, Manifolds, Test, RecursiveArrayTools + @testset "Alternating Gradient Descent" begin # Note that this is merely an alternating gradient descent toy example M = Sphere(2) diff --git a/test/solvers/test_augmented_lagrangian.jl b/test/solvers/test_augmented_lagrangian.jl index 788021dfed..923ae3e8b5 100644 --- a/test/solvers/test_augmented_lagrangian.jl +++ b/test/solvers/test_augmented_lagrangian.jl @@ -35,7 +35,7 @@ using LinearAlgebra: I, tr # dummy ALM problem sp = DefaultManoptProblem(M, ManifoldCostObjective(f)) ss = NelderMeadState(M) - alms = AugmentedLagrangianMethodState(M, co, p0, sp, ss) + alms = AugmentedLagrangianMethodState(M, co, sp, ss; p=p0) set_iterate!(alms, M, 2 .* p0) @test Manopt.get_message(alms) == "" @test get_iterate(alms) == 2 .* p0 @@ -44,6 +44,9 @@ using LinearAlgebra: I, tr ) @test Manopt.get_sub_problem(alms) === sp @test Manopt.get_sub_state(alms) === ss + # With dummy closed form solution + almsc = AugmentedLagrangianMethodState(M, co, f) + @test almsc.sub_state isa Manopt.ClosedFormSubSolverState end @testset "Numbers" begin M = Euclidean() diff --git a/test/solvers/test_conjugate_gradient.jl b/test/solvers/test_conjugate_gradient.jl index 46ee8fc309..a3af7af7b0 100644 --- a/test/solvers/test_conjugate_gradient.jl +++ b/test/solvers/test_conjugate_gradient.jl @@ -11,7 +11,7 @@ include("../utils/example_tasks.jl") dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f, grad_f)) x0 = [0.0, 1.0] sC = StopAfterIteration(1) - s = ConstantStepsize(M) + s = Manopt.ConstantStepsize(M) retr = ExponentialRetraction() vtm = ParallelTransport() @@ -21,10 +21,10 @@ include("../utils/example_tasks.jl") δ2 = [0.5, 2.0] diff = grad_2 - grad_1 - dU = SteepestDirectionUpdateRule() + dU = SteepestDescentCoefficient() s1 = ConjugateGradientDescentState( - M, - x0; + M; + p=x0, stopping_criterion=sC, stepsize=s, coefficient=dU, @@ -33,13 +33,13 @@ include("../utils/example_tasks.jl") initial_gradient=zero_vector(M, x0), ) @test s1.coefficient(dmp, s1, 1) == 0 - @test default_stepsize(M, typeof(s1)) isa ArmijoLinesearch + @test default_stepsize(M, typeof(s1)) isa Manopt.ArmijoLinesearchStepsize @test Manopt.get_message(s1) == "" - dU = ConjugateDescentCoefficient() + dU = Manopt.ConjugateDescentCoefficient() s2 = ConjugateGradientDescentState( - M, - x0; + M; + p=x0, stopping_criterion=sC, stepsize=s, coefficient=dU, @@ -57,8 +57,8 @@ include("../utils/example_tasks.jl") dU = DaiYuanCoefficient() s3 = ConjugateGradientDescentState( - M, - x0; + M; + p=x0, stopping_criterion=sC, stepsize=s, coefficient=dU, @@ -75,8 +75,8 @@ include("../utils/example_tasks.jl") dU = FletcherReevesCoefficient() s4 = ConjugateGradientDescentState( - M, - x0; + M; + p=x0, stopping_criterion=sC, stepsize=s, coefficient=dU, @@ -93,8 +93,8 @@ include("../utils/example_tasks.jl") dU = HagerZhangCoefficient() s5 = ConjugateGradientDescentState( - M, - x0; + M; + p=x0, stopping_criterion=sC, stepsize=s, coefficient=dU, @@ -114,8 +114,8 @@ include("../utils/example_tasks.jl") dU = HestenesStiefelCoefficient() s6 = ConjugateGradientDescentState( - M, - x0; + M; + p=x0, stopping_criterion=sC, stepsize=s, coefficient=dU, @@ -131,8 +131,8 @@ include("../utils/example_tasks.jl") dU = LiuStoreyCoefficient() s7 = ConjugateGradientDescentState( - M, - x0; + M; + p=x0, stopping_criterion=sC, stepsize=s, coefficient=dU, @@ -148,8 +148,8 @@ include("../utils/example_tasks.jl") dU = PolakRibiereCoefficient() s8 = ConjugateGradientDescentState( - M, - x0; + M; + p=x0, stopping_criterion=sC, stepsize=s, coefficient=dU, @@ -175,7 +175,7 @@ include("../utils/example_tasks.jl") f, grad_f, p0; - stepsize=ArmijoLinesearch(M), + stepsize=ArmijoLinesearch(), coefficient=FletcherReevesCoefficient(), stopping_criterion=StopAfterIteration(15), ) @@ -186,7 +186,7 @@ include("../utils/example_tasks.jl") f, grad_f, p0; - stepsize=ArmijoLinesearch(M), + stepsize=ArmijoLinesearch(), coefficient=FletcherReevesCoefficient(), stopping_criterion=StopAfterIteration(15), return_state=true, @@ -201,7 +201,7 @@ include("../utils/example_tasks.jl") M, f, grad_f; - stepsize=ArmijoLinesearch(M), + stepsize=ArmijoLinesearch(), coefficient=FletcherReevesCoefficient(), stopping_criterion=StopAfterIteration(15), ) diff --git a/test/solvers/test_convex_bundle_method.jl b/test/solvers/test_convex_bundle_method.jl index bc159432a7..f07a977954 100644 --- a/test/solvers/test_convex_bundle_method.jl +++ b/test/solvers/test_convex_bundle_method.jl @@ -8,8 +8,8 @@ using Manopt: convex_bundle_method_subsolver, convex_bundle_method_subsolver! diameter = floatmax() Ω = 0.0 cbms = ConvexBundleMethodState( - M, - p0; + M; + p=p0, atol_λ=1e0, diameter=diameter, domain=(M, q) -> distance(M, q, p0) < diameter / 2 ? true : false, diff --git a/test/solvers/test_cyclic_proximal_point.jl b/test/solvers/test_cyclic_proximal_point.jl index 39513be8f3..e6a3c7e7eb 100644 --- a/test/solvers/test_cyclic_proximal_point.jl +++ b/test/solvers/test_cyclic_proximal_point.jl @@ -22,7 +22,7 @@ using ManoptExamples: artificial_S1_signal, Lemniscate N, f, proxes, q3; λ=i -> π / (2 * i), stopping_criterion=StopAfterIteration(100) ) cpps = CyclicProximalPointState( - N, f; stopping_criterion=StopAfterIteration(1), λ=i -> π / (2 * i) + N; p=q, stopping_criterion=StopAfterIteration(1), λ=i -> π / (2 * i) ) mpo = ManifoldProximalMapObjective(f, proxes, [1, 2]) p = DefaultManoptProblem(N, mpo) @@ -120,7 +120,7 @@ using ManoptExamples: artificial_S1_signal, Lemniscate @testset "State access functions" begin M = Euclidean(3) p = ones(3) - O = CyclicProximalPointState(M, zeros(3)) + O = CyclicProximalPointState(M; p=zeros(3)) set_iterate!(O, p) @test get_iterate(O) == p end @@ -128,14 +128,14 @@ using ManoptExamples: artificial_S1_signal, Lemniscate io = IOBuffer() M = Euclidean(3) p = ones(3) - O = CyclicProximalPointState(M, p) + O = CyclicProximalPointState(M; p=p) f(M, p) = L2_Total_Variation(M, q, 0.5, p) proxes = ( (M, λ, p) -> prox_distance(M, λ, q, p), (M, λ, p) -> prox_Total_Variation(M, 0.5 * λ, p), ) s = CyclicProximalPointState( - M, f; stopping_criterion=StopAfterIteration(1), λ=i -> i + M; p=p, stopping_criterion=StopAfterIteration(1), λ=i -> i ) mpo = ManifoldProximalMapObjective(f, proxes, [1, 2]) dmp = DefaultManoptProblem(M, mpo) diff --git a/test/solvers/test_difference_of_convex.jl b/test/solvers/test_difference_of_convex.jl index c9b1e469dc..99e6231516 100644 --- a/test/solvers/test_difference_of_convex.jl +++ b/test/solvers/test_difference_of_convex.jl @@ -36,17 +36,19 @@ import Manifolds: inner @test X2 == X3 dca_sub_objective = ManifoldGradientObjective(dca_sub_cost, dca_sub_grad) dca_sub_problem = DefaultManoptProblem(M, dca_sub_objective) - dca_sub_state = GradientDescentState(M, copy(M, p0)) + dca_sub_state = GradientDescentState(M; p=copy(M, p0)) - dcs = DifferenceOfConvexState(M, copy(M, p0), dca_sub_problem, dca_sub_state) + dcs = DifferenceOfConvexState(M, dca_sub_problem, dca_sub_state; p=copy(M, p0)) @test Manopt.get_message(dcs) == "" + dcsc = DifferenceOfConvexState(M, f) + @test dcsc.sub_state isa Manopt.ClosedFormSubSolverState set_iterate!(dcs, M, p1) @test dcs.p == p1 set_gradient!(dcs, M, p1, X1) @test dcs.X == X1 - set_manopt_parameter!(dcs, :SubProblem, :X, X1) - set_manopt_parameter!(dcs, :SubState, :X, X1) + set_parameter!(dcs, :SubProblem, :X, X1) + set_parameter!(dcs, :SubState, :X, X1) dcppa_sub_cost = ProximalDCCost(g, copy(M, p0), 1.0) dcppa_sub_grad = ProximalDCGrad(grad_g, copy(M, p0), 1.0) @@ -60,7 +62,7 @@ import Manifolds: inner dcppa_sub_objective = ManifoldGradientObjective(dcppa_sub_cost, dcppa_sub_grad) dcppa_sub_problem = DefaultManoptProblem(M, dcppa_sub_objective) - dcppa_sub_state = GradientDescentState(M, copy(M, p0)) + dcppa_sub_state = GradientDescentState(M; p=copy(M, p0)) dcps = DifferenceOfConvexProximalState( #Initialize with random point M, @@ -71,6 +73,9 @@ import Manifolds: inner @test dcps.p == p1 set_gradient!(dcps, M, p1, X1) @test dcps.X == X1 + # Dummy closed form sub + dcpsc = DifferenceOfConvexProximalState(M, f) + @test dcpsc.sub_state isa Manopt.ClosedFormSubSolverState dc_cost_a = ManifoldDifferenceOfConvexObjective(f, grad_h) @test_throws ErrorException difference_of_convex_algorithm( diff --git a/test/solvers/test_exact_penalty.jl b/test/solvers/test_exact_penalty.jl index cededa377c..a018119484 100644 --- a/test/solvers/test_exact_penalty.jl +++ b/test/solvers/test_exact_penalty.jl @@ -42,11 +42,14 @@ using LinearAlgebra: I, tr # Dummy options mco = ManifoldCostObjective(f) dmp = DefaultManoptProblem(M, mco) - epms = ExactPenaltyMethodState(M, p0, dmp, NelderMeadState(M)) + epms = ExactPenaltyMethodState(M, dmp, NelderMeadState(M); p=p0) @test Manopt.get_message(epms) == "" set_iterate!(epms, M, 2 .* p0) @test get_iterate(epms) == 2 .* p0 @test startswith(repr(epms), "# Solver state for `Manopt.jl`s Exact Penalty Method\n") + # With dummy closed form solution + epmsc = ExactPenaltyMethodState(M, f) + @test epmsc.sub_state isa Manopt.ClosedFormSubSolverState @testset "Numbers" begin Me = Euclidean() fe(M, p) = (p + 5)^2 diff --git a/test/solvers/test_gradient_descent.jl b/test/solvers/test_gradient_descent.jl index 35c53deeb5..c51c8782db 100644 --- a/test/solvers/test_gradient_descent.jl +++ b/test/solvers/test_gradient_descent.jl @@ -26,8 +26,8 @@ using ManifoldDiff: grad_distance f, grad_f, data[1]; - stopping_criterion=StopAfterIteration(200) | StopWhenChangeLess(1e-16), - stepsize=ArmijoLinesearch(M; contraction_factor=0.99), + stopping_criterion=StopAfterIteration(200) | StopWhenChangeLess(M, 1e-16), + stepsize=ArmijoLinesearch(; contraction_factor=0.99), debug=d, record=[:Iteration, :Cost, 1], return_state=true, @@ -40,12 +40,11 @@ using ManifoldDiff: grad_distance f, grad_f, data[1]; - stopping_criterion=StopAfterIteration(200) | StopWhenChangeLess(1e-16), - stepsize=ArmijoLinesearch(M; contraction_factor=0.99), + stopping_criterion=StopAfterIteration(200) | StopWhenChangeLess(M, 1e-16), + stepsize=ArmijoLinesearch(; contraction_factor=0.99), ) @test p == p2 - step = NonmonotoneLinesearch( - M; + step = NonmonotoneLinesearch(; stepsize_reduction=0.99, sufficient_decrease=0.1, memory_size=2, @@ -59,30 +58,46 @@ using ManifoldDiff: grad_distance f, grad_f, data[1]; - stopping_criterion=StopAfterIteration(1000) | StopWhenChangeLess(1e-16), + stopping_criterion=StopAfterIteration(1000) | StopWhenChangeLess(M, 1e-16), stepsize=step, debug=[], # do not warn about increasing step here ) @test isapprox(M, p, p3; atol=1e-13) - step.strategy = :inverse + step2 = NonmonotoneLinesearch(; + stepsize_reduction=0.99, + sufficient_decrease=0.1, + memory_size=2, + bb_min_stepsize=1e-7, + bb_max_stepsize=π / 2, + strategy=:inverse, + stop_when_stepsize_exceeds=0.9 * π, + ) p4 = gradient_descent( M, f, grad_f, data[1]; - stopping_criterion=StopAfterIteration(1000) | StopWhenChangeLess(1e-16), - stepsize=step, + stopping_criterion=StopAfterIteration(1000) | StopWhenChangeLess(M, 1e-16), + stepsize=step2, debug=[], # do not warn about increasing step here ) @test isapprox(M, p, p4; atol=1e-13) - step.strategy = :alternating + step3 = NonmonotoneLinesearch(; + stepsize_reduction=0.99, + sufficient_decrease=0.1, + memory_size=2, + bb_min_stepsize=1e-7, + bb_max_stepsize=π / 2, + strategy=:alternating, + stop_when_stepsize_exceeds=0.9 * π, + ) p5 = gradient_descent( M, f, grad_f, data[1]; - stopping_criterion=StopAfterIteration(1000) | StopWhenChangeLess(1e-16), - stepsize=step, + stopping_criterion=StopAfterIteration(1000) | StopWhenChangeLess(M, 1e-16), + stepsize=step3, debug=[], # do not warn about increasing step here ) @test isapprox(M, p, p5; atol=1e-13) @@ -91,22 +106,24 @@ using ManifoldDiff: grad_distance f, grad_f, data[1]; - stopping_criterion=StopAfterIteration(1000) | StopWhenChangeLess(1e-16), - direction=Nesterov(M, data[1]), + stopping_criterion=StopAfterIteration(1000) | StopWhenChangeLess(M, 1e-16), + direction=Nesterov(M; p=copy(M, data[1])), ) @test isapprox(M, p, p6; atol=1e-13) - + M2 = Euclidean() @test_logs ( :warn, string( "The strategy 'indirect' is not defined. The 'direct' strategy is used instead.", ), - ) NonmonotoneLinesearch(Euclidean(); strategy=:indirect) - @test_throws DomainError NonmonotoneLinesearch(Euclidean(); bb_min_stepsize=0.0) - @test_throws DomainError NonmonotoneLinesearch( - Euclidean(); bb_min_stepsize=π / 2, bb_max_stepsize=π / 4 + ) NonmonotoneLinesearch(; strategy=:indirect)(M2) + @test_throws DomainError NonmonotoneLinesearch(Euclidean(); bb_min_stepsize=0.0)(M2) + @test_throws DomainError NonmonotoneLinesearch(; + bb_min_stepsize=π / 2, bb_max_stepsize=π / 4 + )( + M2 ) - @test_throws DomainError NonmonotoneLinesearch(Euclidean(); memory_size=0) + @test_throws DomainError NonmonotoneLinesearch(; memory_size=0)(M2) rec = get_record(s) # after one step for local enough data -> equal to real valued data @@ -133,13 +150,18 @@ using ManifoldDiff: grad_distance f, grad_f, pts[1]; - direction=MomentumGradient(M, copy(M, pts[1])), - stepsize=ConstantStepsize(M), + direction=MomentumGradient(), + stepsize=ConstantLength(), debug=[], # do not warn about increasing step here ) @test isapprox(M, north, n3) n4 = gradient_descent( - M, f, grad_f, pts[1]; direction=AverageGradient(M, copy(M, pts[1]); n=5) + M, + f, + grad_f, + pts[1]; + direction=AverageGradient(M; n=5), + stopping_criterion=StopAfterIteration(800), ) @test isapprox(M, north, n4; atol=1e-7) n5 = copy(M, pts[1]) @@ -164,14 +186,14 @@ using ManifoldDiff: grad_distance f(M, p) = distance(M, p, q) .^ 2 # choose a wrong gradient such that ConstantStepsize yields an increase grad_f(M, p) = -grad_distance(M, q, p) - @test_logs (:info,) set_manopt_parameter!(:Mode, "Tutorial") + @test_logs (:info,) Manopt.set_parameter!(:Mode, "Tutorial") @test_logs (:warn,) (:warn,) (:warn,) gradient_descent( - M, f, grad_f, 1 / sqrt(2) .* [1.0, -1.0, 0.0]; stepsize=ConstantStepsize(1.0) + M, f, grad_f, 1 / sqrt(2) .* [1.0, -1.0, 0.0]; stepsize=ConstantLength() ) grad_f2(M, p) = 20 * grad_distance(M, q, p) @test_logs (:warn,) (:warn,) gradient_descent( M, f, grad_f2, 1 / sqrt(2) .* [1.0, -1.0, 0.0]; ) - @test_logs (:info,) set_manopt_parameter!(:Mode, "") + @test_logs (:info,) Manopt.set_parameter!(:Mode, "") end end diff --git a/test/solvers/test_interior_point_Newton.jl b/test/solvers/test_interior_point_Newton.jl index ddd646a97d..dcf681bdb7 100644 --- a/test/solvers/test_interior_point_Newton.jl +++ b/test/solvers/test_interior_point_Newton.jl @@ -1,6 +1,13 @@ using Manifolds, Manopt, LinearAlgebra, Random, Test @testset "Interior Point Newton Solver" begin + @testset "StepsizeState" begin + M = Manifolds.Sphere(2) + a = StepsizeState(M) + b = StepsizeState(a.p, a.X) + @test a.p === b.p + @test a.X === b.X + end @testset "A solver run on the Sphere" begin # We can take a look at debug prints of one run and plot the result # on CI and when running with ] test Manopt, both have to be set to false. @@ -17,6 +24,12 @@ using Manifolds, Manopt, LinearAlgebra, Random, Test Hess_g(M, p, X) = [(X * p')[:, i] for i in 1:3] M = Manifolds.Sphere(2) + # With dummy closed form solution + ipnsc = InteriorPointNewtonState( + M, ConstrainedManifoldObjective(f, grad_f; g=g, grad_g=grad_g, M=M), f + ) + @test ipnsc.sub_state isa Manopt.ClosedFormSubSolverState + p_0 = (1.0 / (sqrt(3.0))) .* [1.0, 1.0, 1.0] # p_0 = 1.0 / sqrt(2) .* [0.0, 1.0, 1.0] p_opt = [0.0, 0.0, 1.0] diff --git a/test/solvers/test_particle_swarm.jl b/test/solvers/test_particle_swarm.jl index 90a78fc49d..af5bc508f4 100644 --- a/test/solvers/test_particle_swarm.jl +++ b/test/solvers/test_particle_swarm.jl @@ -46,8 +46,8 @@ using Random p = DefaultManoptProblem(M, ManifoldCostObjective(f)) o = ParticleSwarmState(M, zero.(p_start), X_start) # test `set_iterate`` - Manopt.set_manopt_parameter!(o, :Population, p_start) - @test sum(norm.(Manopt.get_manopt_parameter(o, :Population) .- p_start)) == 0 + Manopt.set_parameter!(o, :Population, p_start) + @test sum(norm.(Manopt.get_parameter(o, :Population) .- p_start)) == 0 initialize_solver!(p, o) step_solver!(p, o, 1) for (p, v) in zip(o.swarm, o.velocity) diff --git a/test/solvers/test_primal_dual_semismooth_Newton.jl b/test/solvers/test_primal_dual_semismooth_Newton.jl index ead3aea0a1..ef33a596c9 100644 --- a/test/solvers/test_primal_dual_semismooth_Newton.jl +++ b/test/solvers/test_primal_dual_semismooth_Newton.jl @@ -1,4 +1,4 @@ -using Manopt, Manifolds, ManifoldsBase, Test +using Manopt, Manifolds, ManifoldsBase, Test, RecursiveArrayTools using ManoptExamples: adjoint_differential_forward_logs, differential_forward_logs, project_collaborative_TV using ManifoldDiff: differential_shortest_geodesic_startpoint diff --git a/test/solvers/test_proximal_bundle_method.jl b/test/solvers/test_proximal_bundle_method.jl index 79c296d50b..781935d0b5 100644 --- a/test/solvers/test_proximal_bundle_method.jl +++ b/test/solvers/test_proximal_bundle_method.jl @@ -5,7 +5,7 @@ import Manopt: proximal_bundle_method_subsolver, proximal_bundle_method_subsolve M = Hyperbolic(4) p = [0.0, 0.0, 0.0, 0.0, 1.0] p0 = [0.0, 0.0, 0.0, 0.0, -1.0] - pbms = ProximalBundleMethodState(M, p0; stopping_criterion=StopAfterIteration(200)) + pbms = ProximalBundleMethodState(M; p=p0, stopping_criterion=StopAfterIteration(200)) @test get_iterate(pbms) == p0 pbms.X = [1.0, 0.0, 0.0, 0.0, 0.0] diff --git a/test/solvers/test_quasi_Newton.jl b/test/solvers/test_quasi_Newton.jl index f6bb37e678..678bd0c545 100644 --- a/test/solvers/test_quasi_Newton.jl +++ b/test/solvers/test_quasi_Newton.jl @@ -33,8 +33,8 @@ end qnu = InverseBFGS() d = QuasiNewtonMatrixDirectionUpdate(M, qnu) @test Manopt.status_summary(d) == - "$(qnu) with initial scaling true and vector transport method ParallelTransport()." - s = "QuasiNewtonMatrixDirectionUpdate(DefaultOrthonormalBasis(ℝ), [1.0 0.0 0.0 0.0; 0.0 1.0 0.0 0.0; 0.0 0.0 1.0 0.0; 0.0 0.0 0.0 1.0], true, InverseBFGS(), ParallelTransport())\n" + "$(qnu) with initial scaling 1.0 and vector transport method ParallelTransport()." + s = "QuasiNewtonMatrixDirectionUpdate(DefaultOrthonormalBasis(ℝ), [1.0 0.0 0.0 0.0; 0.0 1.0 0.0 0.0; 0.0 0.0 1.0 0.0; 0.0 0.0 0.0 1.0], 1.0, InverseBFGS(), ParallelTransport())\n" @test repr(d) == s @test Manopt.get_message(d) == "" end @@ -363,9 +363,9 @@ end fc(::Euclidean, p) = real(p' * A * p) grad_fc(::Euclidean, p) = 2 * A * p p0 = [2.0, 1 + im] - @test_logs (:info,) Manopt.set_manopt_parameter!(:Mode, "Tutorial") + @test_logs (:info,) Manopt.set_parameter!(:Mode, "Tutorial") p4 = quasi_Newton(M, fc, grad_fc, p0; stopoing_criterion=StopAfterIteration(3)) - @test_logs (:info,) Manopt.set_manopt_parameter!(:Mode, "") + @test_logs (:info,) Manopt.set_parameter!(:Mode, "") @test fc(M, p4) ≤ fc(M, p0) end @@ -376,7 +376,7 @@ end grad_f(M, p) = 2 * sum(p) gmp = ManifoldGradientObjective(f, grad_f) mp = DefaultManoptProblem(M, gmp) - qns = QuasiNewtonState(M, p) + qns = QuasiNewtonState(M; p=p) # push zeros to memory push!(qns.direction_update.memory_s, copy(p)) push!(qns.direction_update.memory_s, copy(p)) @@ -396,8 +396,8 @@ end gmp = ManifoldGradientObjective(f, grad_f) mp = DefaultManoptProblem(M, gmp) qns = QuasiNewtonState( - M, - copy(M, p); + M; + p=copy(M, p), direction_update=QuasiNewtonGradientDirectionUpdate(ParallelTransport()), nondescent_direction_behavior=:step_towards_negative_gradient, ) @@ -411,8 +411,8 @@ end ) solve!(mp, dqns) qns = QuasiNewtonState( - M, - copy(M, p); + M; + p=copy(M, p), direction_update=QuasiNewtonGradientDirectionUpdate(ParallelTransport()), nondescent_direction_behavior=:step_towards_negative_gradient, ) @@ -421,8 +421,8 @@ end @test qns.direction_update.num_times_init == 1 qns = QuasiNewtonState( - M, - copy(M, p); + M; + p=copy(M, p), direction_update=QuasiNewtonGradientDirectionUpdate(ParallelTransport()), nondescent_direction_behavior=:reinitialize_direction_update, ) diff --git a/test/solvers/test_stochastic_gradient_descent.jl b/test/solvers/test_stochastic_gradient_descent.jl index aee51e8208..a0c1a59e9a 100644 --- a/test/solvers/test_stochastic_gradient_descent.jl +++ b/test/solvers/test_stochastic_gradient_descent.jl @@ -26,8 +26,8 @@ using Manopt, Manifolds, Test end for x in pts] @testset "Constructors" begin - sg = StochasticGradient(M; p=p) - @test sg.dir == zero_vector(M, p) + sg = StochasticGradient(M; p=p)() + @test sg.X == zero_vector(M, p) msgo1 = ManifoldStochasticGradientObjective(sgrad_f1) dmp1 = DefaultManoptProblem(M, msgo1) @@ -77,7 +77,7 @@ using Manopt, Manifolds, Test @test_throws ErrorException get_gradients(dmp1i, p) @test_throws ErrorException get_gradient!(dmp1i, Z4, p, 1) sgds = StochasticGradientDescentState( - M, deepcopy(p), zero_vector(M, p); direction=StochasticGradient(deepcopy(p)) + M; p=deepcopy(p), X=zero_vector(M, p), direction=StochasticGradient(; p=p)(M) ) sgds.order = collect(1:5) sgds.order_type = :Linear @@ -109,9 +109,7 @@ using Manopt, Manifolds, Test sgrad_f1, p; order_type=:Random, - direction=AverageGradient( - M, p; n=10, direction=StochasticGradient(zero_vector(M, p)) - ), + direction=AverageGradient(M; p=p, n=10, direction=StochasticGradient()), ) @test is_point(M, q5, true) q6 = stochastic_gradient_descent( @@ -119,9 +117,7 @@ using Manopt, Manifolds, Test sgrad_f1, p; order_type=:Random, - direction=MomentumGradient( - M, p; direction=StochasticGradient(zero_vector(M, p)) - ), + direction=MomentumGradient(; p=p, direction=StochasticGradient()), ) @test is_point(M, q6, true) end diff --git a/test/solvers/test_subgradient_method.jl b/test/solvers/test_subgradient_method.jl index 392335b959..02ff408cb0 100644 --- a/test/solvers/test_subgradient_method.jl +++ b/test/solvers/test_subgradient_method.jl @@ -8,19 +8,22 @@ include("../utils/example_tasks.jl") p0 = [5.0, 2.0] q0 = [10.0, 5.0] sgs = SubGradientMethodState( - M, p0; stopping_criterion=StopAfterIteration(200), stepsize=ConstantStepsize(M) + M; + p=p0, + stopping_criterion=StopAfterIteration(200), + stepsize=Manopt.ConstantStepsize(M), ) sgs_ac = SubGradientMethodState( - M, - q0; + M; + p=q0, stopping_criterion=StopAfterIteration(200), - stepsize=ConstantStepsize(1.0, :absolute), + stepsize=Manopt.ConstantStepsize(M, 1.0; type=:absolute), ) sgs_ad = SubGradientMethodState( - M, - q0; + M; + p=q0, stopping_criterion=StopAfterIteration(200), - stepsize=DecreasingStepsize(1, 1, 0, 1, 0, :absolute), + stepsize=Manopt.DecreasingStepsize(M; length=1.0, type=:absolute), ) @test startswith(repr(sgs), "# Solver state for `Manopt.jl`s Subgradient Method\n") @test get_iterate(sgs) == p0 @@ -102,6 +105,10 @@ include("../utils/example_tasks.jl") Random.seed!(23) q4 = subgradient_method(M, f, ∂f!; evaluation=InplaceEvaluation()) @test isapprox(M, q4, p; atol=0.5) # random point -> not that close + # in-place + q5 = copy(M, p0) + subgradient_method!(M, f, ∂f!, q5; evaluation=InplaceEvaluation()) + @test isapprox(M, q3, q5) # Check Fallbacks of Problem @test get_cost(mp, q1) == 0.0 @test norm(M, q1, get_subgradient(mp, q1)) == 0 diff --git a/test/solvers/test_truncated_cg.jl b/test/solvers/test_truncated_cg.jl index 835db782f6..d72520a702 100644 --- a/test/solvers/test_truncated_cg.jl +++ b/test/solvers/test_truncated_cg.jl @@ -4,7 +4,7 @@ using Manifolds, Manopt, ManifoldsBase, Test M = Grassmann(3, 2) p = [1.0 0.0; 0.0 1.0; 0.0 0.0] η = zero_vector(M, p) - s = TruncatedConjugateGradientState(TangentSpace(M, p), η) + s = TruncatedConjugateGradientState(TangentSpace(M, p); X=η) @test startswith( repr(s), "# Solver state for `Manopt.jl`s Truncated Conjugate Gradient Descent\n" ) diff --git a/test/solvers/test_trust_regions.jl b/test/solvers/test_trust_regions.jl index 6e67e09953..6d1b153011 100644 --- a/test/solvers/test_trust_regions.jl +++ b/test/solvers/test_trust_regions.jl @@ -27,10 +27,10 @@ include("../utils/example_tasks.jl") TpM = TangentSpace(M, copy(M, p)) mho = ManifoldHessianObjective(f, rgrad, rhess) sub_problem = DefaultManoptProblem(TpM, TrustRegionModelObjective(mho)) - sub_state = TruncatedConjugateGradientState(TpM, get_gradient(M, mho, p)) + sub_state = TruncatedConjugateGradientState(TpM; X=get_gradient(M, mho, p)) trs1 = TrustRegionsState(M, sub_problem) trs2 = TrustRegionsState(M, sub_problem, sub_state) - trs3 = TrustRegionsState(M, p, sub_problem) + trs3 = TrustRegionsState(M, sub_problem; p=p) end @testset "Objective accessors" begin mho = ManifoldHessianObjective(f, rgrad, rhess) @@ -93,49 +93,24 @@ include("../utils/example_tasks.jl") M, f, rgrad, rhess, p, X; trust_region_radius=0.5 ) @test Y != X - Y2 = truncated_conjugate_gradient_descent( # with approximate Hessian - M, - f, - rgrad, - p, - X; - trust_region_radius=0.5, - ) - @test_broken isapprox(M, p, Y, Y2) # random point -> different result - Y3 = truncated_conjugate_gradient_descent( #random point and vector + Y2 = truncated_conjugate_gradient_descent( #random point and vector M, f, rgrad, rhess; trust_region_radius=0.5, ) - @test Y3 != X - Y4 = truncated_conjugate_gradient_descent( # 2 & 3 - M, - f, - rgrad; - trust_region_radius=0.5, - ) - @test Y4 != X - Y5 = truncated_conjugate_gradient_descent( # 2 & 3 - M, - f, - rgrad, - rhess, - p, - X; - trust_region_radius=0.5, + @test Y2 != X + Y3 = truncated_conjugate_gradient_descent( + M, f, rgrad, rhess, p, X; trust_region_radius=0.5 ) - @test Y5 != X - Y6 = copy(M, p, X) + @test Y3 != X + Y4 = copy(M, p, X) truncated_conjugate_gradient_descent!( - M, f, rgrad, rhess, p, Y6; trust_region_radius=0.5 + M, f, rgrad, rhess, p, Y4; trust_region_radius=0.5 ) - @test Y6 != X - Y7 = copy(M, p, X) - truncated_conjugate_gradient_descent!(M, f, rgrad, p, Y7; trust_region_radius=0.5) - @test Y7 != X + @test Y4 != X end @testset "Mutating" begin g = RGrad(M, A) @@ -358,23 +333,10 @@ include("../utils/example_tasks.jl") @test distance(Mc, pc_star, q[]) < 1e-2 q2 = trust_regions(Mc, fc, grad_fc, hess_fc) @test distance(Mc, pc_star, q[]) < 1e-2 - q2 = trust_regions(Mc, fc, grad_fc, hess_fc, 0.1; evaluation=InplaceEvaluation()) - @test distance(Mc, pc_star, q[]) < 1e-2 Y1 = truncated_conjugate_gradient_descent( Mc, fc, grad_fc, hess_fc, 0.1, 0.0; trust_region_radius=0.5 ) @test abs(Y1) ≈ 0.5 - Y1 = truncated_conjugate_gradient_descent( - Mc, - fc, - grad_fc, - hess_fc, - 0.1, - 0.0; - evaluation=InplaceEvaluation(), - trust_region_radius=0.5, - ) - @test abs(Y1) ≈ 0.5 end @testset "Euclidean Embedding" begin Random.seed!(42) diff --git a/test/test_aqua.jl b/test/test_aqua.jl index a145326063..abb59e0c43 100644 --- a/test/test_aqua.jl +++ b/test/test_aqua.jl @@ -1,19 +1,5 @@ using Aqua, Manopt, Test @testset "Aqua.jl" begin - Aqua.test_all( - Manopt; - ambiguities=( - exclude=[#For now: exclude some high-level functions, since in their - # different call schemes some ambiguities appear - # These should be carefully checked -> see also https://github.com/JuliaManifolds/Manopt.jl/issues/381 - Manopt.truncated_conjugate_gradient_descent, # ambiguity corresponds a problem with p and the Hessian and both being positional - Manopt.difference_of_convex_proximal_point, # should be fixed - Manopt.particle_swarm, # should be fixed - Manopt.stochastic_gradient_descent, # should be fixed - Manopt.truncated_conjugate_gradient_descent!, # should be fixed when removing deprecated methods - ], - broken=false, - ), - ) + Aqua.test_all(Manopt; ambiguities=(broken=false,)) end diff --git a/test/test_deprecated.jl b/test/test_deprecated.jl index 8e0c21f4d7..3b0b1585b4 100644 --- a/test/test_deprecated.jl +++ b/test/test_deprecated.jl @@ -1,29 +1,3 @@ using Manopt, Manifolds, ManifoldsBase, Test -@testset "test deprecated definitions still work" begin - @testset "outdated kwargs in constructors" begin - @test_logs (:warn,) DebugChange(; invretr=LogarithmicInverseRetraction()) - @test_logs (:warn,) DebugChange(; manifold=ManifoldsBase.DefaultManifold()) - @test_logs (:warn,) RecordChange(; manifold=ManifoldsBase.DefaultManifold()) - @test_logs (:warn,) StopWhenChangeLess(1e-9; manifold=Euclidean()) - end - - @testset "Outdated constrained accessors" begin - M = ManifoldsBase.DefaultManifold(3) - f(::ManifoldsBase.DefaultManifold, p) = norm(p)^2 - grad_f(M, p) = 2 * p - g(M, p) = [p[1] - 1, -p[2] - 1] - grad_g(M, p) = [[1.0, 0.0, 0.0], [0.0, -1.0, 0.0]] - h(M, p) = [2 * p[3] - 1] - grad_h(M, p) = [[0.0, 0.0, 2.0]] - co = ConstrainedManifoldObjective( - ManifoldGradientObjective(f, grad_f); - equality_constraints=VectorGradientFunction(g, grad_g, 2), - inequality_constraints=VectorGradientFunction(h, grad_h, 1), - ) - dmp = DefaultManoptProblem(M, co) - p = [1.0, 2.0, 3.0] - @test_logs (:warn,) get_constraints(dmp, p) - @test_logs (:warn,) get_constraints(M, co, p) - end -end +@testset "test deprecated definitions still work" begin end diff --git a/test/utils/dummy_types.jl b/test/utils/dummy_types.jl index 9d5792a8a4..a06a8a55af 100644 --- a/test/utils/dummy_types.jl +++ b/test/utils/dummy_types.jl @@ -1,5 +1,5 @@ using Manopt, ManifoldsBase -import Manopt: get_iterate, set_manopt_parameter! +import Manopt: get_iterate, set_parameter! s = @isdefined _dummy_types_includeed if !s _dummy_types_includeed = true @@ -19,5 +19,5 @@ if !s end DummyState() = DummyState([]) get_iterate(::DummyState) = NaN - set_manopt_parameter!(s::DummyState, ::Val, v) = s + set_parameter!(s::DummyState, ::Val, v) = s end diff --git a/tutorials/ConstrainedOptimization.qmd b/tutorials/ConstrainedOptimization.qmd index 223227f5b5..4ed459adc1 100644 --- a/tutorials/ConstrainedOptimization.qmd +++ b/tutorials/ConstrainedOptimization.qmd @@ -122,7 +122,7 @@ Now as a first method we can just call the [Augmented Lagrangian Method](https:/ M, f, grad_f, p0; g=g, grad_g=grad_g, debug=[:Iteration, :Cost, :Stop, " | ", (:Change, "Δp : %1.5e"), 20, "\n"], stopping_criterion = StopAfterIteration(300) | ( - StopWhenSmallerOrEqual(:ϵ, 1e-5) & StopWhenChangeLess(1e-8) + StopWhenSmallerOrEqual(:ϵ, 1e-5) & StopWhenChangeLess(M, 1e-8) ) ); ``` @@ -166,7 +166,7 @@ We obtain M, f, grad_f!, p0; g=g2, grad_g=grad_g2!, evaluation=InplaceEvaluation(), debug=[:Iteration, :Cost, :Stop, " | ", (:Change, "Δp : %1.5e"), 20, "\n"], stopping_criterion = StopAfterIteration(300) | ( - StopWhenSmallerOrEqual(:ϵ, 1e-5) & StopWhenChangeLess(1e-8) + StopWhenSmallerOrEqual(:ϵ, 1e-5) & StopWhenChangeLess(M, 1e-8) ) ); ``` diff --git a/tutorials/GeodesicRegression.qmd b/tutorials/GeodesicRegression.qmd index 1f5e815775..4b410eef04 100644 --- a/tutorials/GeodesicRegression.qmd +++ b/tutorials/GeodesicRegression.qmd @@ -22,12 +22,10 @@ img_folder = "img/regression"; mkpath(img_folder) ``` ```{julia} - using Manopt, ManifoldDiff, Manifolds, Random, Colors - using LinearAlgebra: svd - using ManifoldDiff: grad_distance - Random.seed!(42); -``` -```{julia} +using Manopt, ManifoldDiff, Manifolds, Random, Colors, RecursiveArrayTools +using LinearAlgebra: svd +using ManifoldDiff: grad_distance +Random.seed!(42);```{julia} #| echo: false #| code-fold: true #| output: false @@ -197,8 +195,7 @@ y = gradient_descent( RegressionGradient!(data, t), x0; evaluation=InplaceEvaluation(), - stepsize=ArmijoLinesearch( - M; + stepsize=ArmijoLinesearch(; initial_stepsize=1.0, contraction_factor=0.990, sufficient_decrease=0.05, @@ -287,8 +284,7 @@ y2 = gradient_descent( RegressionGradient!(data2, t2), x1; evaluation=InplaceEvaluation(), - stepsize=ArmijoLinesearch( - M; + stepsize=ArmijoLinesearch(; initial_stepsize=1.0, contraction_factor=0.990, sufficient_decrease=0.05, @@ -458,8 +454,7 @@ y3 = alternating_gradient_descent( x2; evaluation=InplaceEvaluation(), debug=[:Iteration, " | ", :Cost, "\n", :Stop, 50], - stepsize=ArmijoLinesearch( - M; + stepsize=ArmijoLinesearch(; contraction_factor=0.999, sufficient_decrease=0.066, stop_when_stepsize_less=1e-11, diff --git a/tutorials/HowToRecord.qmd b/tutorials/HowToRecord.qmd index b4b764d07b..c396dddf34 100644 --- a/tutorials/HowToRecord.qmd +++ b/tutorials/HowToRecord.qmd @@ -123,8 +123,8 @@ We first generate the problem and the state, to also illustrate the low-level wo ```{julia} p = DefaultManoptProblem(M, ManifoldGradientObjective(f, grad_f)) s = GradientDescentState( - M, - copy(data[1]); + M; + p=copy(data[1]), stopping_criterion=StopAfterIteration(200) | StopWhenGradientNormLess(10.0^-9), ) ``` @@ -334,7 +334,7 @@ R3 = gradient_descent( RecordCount() => :Count, :Cost], ], - stepsize = ConstantStepsize(1.0), + stepsize = ConstantLength(1.0), stopping_criterion=StopAfterIteration(20), debug=[], return_state=true, diff --git a/tutorials/ImplementOwnManifold.qmd b/tutorials/ImplementOwnManifold.qmd index cf4a8f48f2..93fc77ae8e 100644 --- a/tutorials/ImplementOwnManifold.qmd +++ b/tutorials/ImplementOwnManifold.qmd @@ -48,7 +48,7 @@ Random.seed!(42) #| code-fold: true #| output: false # to keep the output and usage simple let's deactivate tutorial mode here -Manopt.set_manopt_parameter!(:Mode, "None") +Manopt.set_parameter!(:Mode, "None") ``` We also define the same manifold as in @@ -182,7 +182,7 @@ Let's discuss these in the next steps. ```{julia} q1 = gradient_descent(M, f, grad_f, p0; retraction_method = ProjectionRetraction(), # state, that we use the retraction from above - stepsize = DecreasingStepsize(M; length=1.0), # A simple step size + stepsize = DecreasingLength(M; length=1.0), # A simple step size stopping_criterion = StopAfterIteration(10), # A simple stopping criterion X = zeros(d+1), # how we define/represent tangent vectors ) diff --git a/tutorials/Optimize.qmd b/tutorials/Optimize.qmd index 714ba83c92..3a155ed82f 100644 --- a/tutorials/Optimize.qmd +++ b/tutorials/Optimize.qmd @@ -144,7 +144,7 @@ a little more expensive [`ArmijoLineSeach`](https://manoptjl.org/stable/plans/st m4 = gradient_descent(M, f, grad_f, data[1]; debug=[:Iteration,(:Change, "|Δp|: %1.9f |"), (:Cost, " F(x): %1.11f | "), "\n", :Stop, 2], - stepsize = ArmijoLinesearch(M; contraction_factor=0.999, sufficient_decrease=0.5), + stepsize = ArmijoLinesearch(; contraction_factor=0.999, sufficient_decrease=0.5), stopping_criterion = StopWhenGradientNormLess(1e-14) | StopAfterIteration(400), ) ``` @@ -163,7 +163,7 @@ It can be set using ```{julia} #| warning: true -Manopt.set_manopt_parameter!(:Mode, "Tutorial") +Manopt.set_parameter!(:Mode, "Tutorial") ``` to activate these. Continuing from the example before, one might argue, that the @@ -210,7 +210,7 @@ keyword, that in `Tutorial` mode contains additional warnings. The other option is to globally reset the `:Mode` back to ```{julia} #| warning: true -Manopt.set_manopt_parameter!(:Mode, "") +Manopt.set_parameter!(:Mode, "") ``` ## Example 2: computing the median of symmetric positive definite matrices diff --git a/tutorials/Project.toml b/tutorials/Project.toml index db4bdfaa9c..c90beab6b0 100644 --- a/tutorials/Project.toml +++ b/tutorials/Project.toml @@ -10,6 +10,7 @@ Manifolds = "1cead3c2-87b3-11e9-0ccd-23c62b72b94e" ManifoldsBase = "3362f125-f0bb-47a3-aa74-596ffd7ef2fb" Manopt = "0fc0a36d-df90-57f3-8f93-d78a9fc72bb5" Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80" +RecursiveArrayTools = "731186ca-8d62-57ce-b412-fbd966d074cd" [compat] BenchmarkTools = "1" @@ -19,7 +20,8 @@ FiniteDifferences = "0.12" IJulia = "1" LRUCache = "1.4" ManifoldDiff = "0.3.9" -Manifolds = "0.8.81, 0.9" +Manifolds = "0.8.81, 0.9, 0.10" ManifoldsBase = "0.14.12, 0.15" -Manopt = "0.4.51" +Manopt = "0.5" Plots = "1.38" +RecursiveArrayTools = "2, 3" diff --git a/tutorials/StochasticGradientDescent.qmd b/tutorials/StochasticGradientDescent.qmd index 24f65d811f..9312fa3f8b 100644 --- a/tutorials/StochasticGradientDescent.qmd +++ b/tutorials/StochasticGradientDescent.qmd @@ -95,32 +95,32 @@ p_opt2 = stochastic_gradient_descent(M, gradf, p0) This result is reasonably close. But we can improve it by using a `DirectionUpdateRule`, namely: -On the one hand [`MomentumGradient`](https://manoptjl.org/stable/solvers/gradient_descent.html#Manopt.MomentumGradient), which requires both the manifold and the initial value, to keep track of the iterate and parallel transport the last direction to the current iterate. +On the one hand [`MomentumGradient`](@ref), which requires both the manifold and the initial value, to keep track of the iterate and parallel transport the last direction to the current iterate. The necessary `vector_transport_method` keyword is set to a suitable default on every manifold, see ``[`default_vector_transport_method`](@extref `ManifoldsBase.default_vector_transport_method-Tuple{AbstractManifold}`)``{=commonmark}. We get """ ```{julia} p_opt3 = stochastic_gradient_descent( - M, gradf, p0; direction=MomentumGradient(M, p0; direction=StochasticGradient(M)) + M, gradf, p0; direction=MomentumGradient(; direction=StochasticGradient()) ) ``` ```{julia} -MG = MomentumGradient(M, p0; direction=StochasticGradient(M)); -@benchmark stochastic_gradient_descent($M, $gradf, $p0; direction=$MG) +MG = MomentumGradient(; direction=StochasticGradient()); +@benchmark stochastic_gradient_descent($M, $gradf, p=$p0; direction=$MG) ``` And on the other hand the [`AverageGradient`](https://manoptjl.org/stable/solvers/gradient_descent.html#Manopt.AverageGradient) computes an average of the last `n` gradients. This is done by ```{julia} p_opt4 = stochastic_gradient_descent( - M, gradf, p0; direction=AverageGradient(M, p0; n=10, direction=StochasticGradient(M)), debug=[], + M, gradf, p0; direction=AverageGradient(; n=10, direction=StochasticGradient()), debug=[], ) ``` ```{julia} -AG = AverageGradient(M, p0; n=10, direction=StochasticGradient(M)); -@benchmark stochastic_gradient_descent($M, $gradf, $p0; direction=$AG, debug=[]) +AG = AverageGradient(; n=10, direction=StochasticGradient(M)); +@benchmark stochastic_gradient_descent($M, $gradf, p=$p0; direction=$AG, debug=[]) ``` Note that the default `StoppingCriterion` is a fixed number of iterations which helps the comparison here. @@ -131,19 +131,17 @@ since both rules can also be used with the `IdentityUpdateRule` within [`gradien For this not-that-large-scale example we can of course also use a gradient descent with `ArmijoLinesearch`, ```{julia} -fullGradF(M, p) = sum(grad_distance(M, q, p) for q in data) -p_opt5 = gradient_descent(M, F, fullGradF, p0; stepsize=ArmijoLinesearch(M)) +fullGradF(M, p) = 1/n*sum(grad_distance(M, q, p) for q in data) +p_opt5 = gradient_descent(M, F, fullGradF, p0; stepsize=ArmijoLinesearch()) ``` but in general it is expected to be a bit slow. ```{julia} -AL = ArmijoLinesearch(M); +AL = ArmijoLinesearch(); @benchmark gradient_descent($M, $F, $fullGradF, $p0; stepsize=$AL) ``` -Note that all 5 runs are very close to each other. - ## Technical details This tutorial is cached. It was last run on the following package versions.