All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning. This changelog does not include internal changes that do not affect the user.
- Added the function
torchjd.autojac.jac. It's the same astorchjd.autojac.backwardexcept that it returns the Jacobians as a tuple instead of storing them in the.jacfields of the inputs. Its interface is analog to that oftorch.autograd.grad. - Added a
jac_tensorsparameter tobackward, allowing to pre-multiply the Jacobian computation by initial Jacobians. This enables multi-step chain rule computations and is analogous to thegrad_tensorsparameter intorch.autograd.backward. - Added a
grad_tensorsparameter tomtl_backward, allowing to use non-scalarlosses(now renamed totensors). This is analogous to thegrad_tensorsparameter oftorch.autograd.backward. When usingscalarlosses, the usage does not change. - Added a
jac_outputsparameter tojac, allowing to pre-multiply the Jacobian computation by initial Jacobians. This is analogous to thegrad_outputsparameter intorch.autograd.grad. - Added a
scale_modeparameter toAlignedMTLandAlignedMTLWeighting, allowing to choose between"min","median", and"rmse"scaling. - Added an attribute
gramian_weightingto all aggregators that use a gramian-basedWeighting. Usage is still the same,aggregator.gramian_weightingis just an alias for the (quite confusing)aggregator.weighting.weightingfield.
- BREAKING: Removed from
backwardandmtl_backwardthe responsibility to aggregate the Jacobian. Now, these functions compute and populate the.jacfields of the parameters, and a new functiontorchjd.autojac.jac_to_gradshould then be called to aggregate those.jacfields into.gradfields. This means that users now have more control on what they do with the Jacobians (they can easily aggregate them group by group or even param by param if they want), but it now requires an extra line of code to do the Jacobian descent step. To update, please change:tobackward(losses, aggregator)
andbackward(losses) jac_to_grad(model.parameters(), aggregator)
tomtl_backward(losses, features, aggregator)
mtl_backward(losses, features) jac_to_grad(shared_module.parameters(), aggregator)
- BREAKING: Made some parameters of the public interface of
torchjdpositional-only or keyword-only:backward: Thetensorsparameter is now positional-only. Suggested change:backward(tensors=losses)=>backward(losses). All other parameters are now keyword-only.mtl_backward: Thetensorsparameter (previously namedlosses) is now positional-only. Suggested change:mtl_backward(losses=losses, features=features)=>mtl_backward(losses, features=features). Thefeaturesparameter remains usable as positional or keyword. All other parameters are now keyword-only.Aggregator.__call__: Thematrixparameter is now positonal-only. Suggested change:aggregator(matrix=matrix)=>aggregator(matrix).Weighting.__call__: Thestatparameter is now positional-only. Suggested change:weighting(stat=gramian)=>weighting(gramian).GeneralizedWeighting.__call__: Thegeneralized_gramianparameter is now positional-only. Suggested change:generalized_weighting(generalized_gramian=generalized_gramian)=>generalized_weighting(generalized_gramian).
- Removed several unnecessary memory duplications. This should significantly improve the memory
efficiency and speed of
autojac. - Increased the lower bounds of the torch (from 2.0.0 to 2.3.0) and numpy (from 1.21.0
to 1.21.2) dependencies to reflect what really works with torchjd. We now also run torchjd's tests
with the dependency lower-bounds specified in
pyproject.toml, so we should now always accurately reflect the actual lower-bounds.
- Added
__all__in the__init__.pyof packages. This should prevent PyLance from triggering warnings when importing fromtorchjd.
- Added the
autogrampackage, with theautogram.Engine. This is an implementation of Algorithm 3 from Jacobian Descent for Multi-Objective Optimization, optimized for batched computations, as in IWRM. Generalized Gramians can also be obtained by using the autogram engine on a tensor of losses of arbitrary shape. - For all
Aggregators based on the weighting of the Gramian of the Jacobian, made theirWeightingclass public. It can be used directly on a Gramian (computed via theautogram.Engine) to extract some weights. The list of new public classes is:Weighting(abstract base class)UPGradWeightingAlignedMTLWeightingCAGradWeightingConstantWeightingDualProjWeightingIMTLGWeightingKrumWeightingMeanWeightingMGDAWeightingPCGradWeightingRandomWeightingSumWeighting
- Added
GeneralizedWeighting(base class) andFlattening(implementation) to extract tensors of weights from generalized Gramians. - Added usage example for IWRM with autogram.
- Added usage example for IWRM with partial autogram.
- Added usage example for IWMTL with autogram.
- Added Python 3.14 classifier in pyproject.toml (we now also run tests on Python 3.14 in the CI).
- Removed an unnecessary internal reshape when computing Jacobians. This should have no effect but a
slight performance improvement in
autojac. - Revamped documentation.
- Made
backwardandmtl_backwardimportable fromtorchjd.autojac(like it was prior to 0.7.0). - Deprecated importing
backwardandmtl_backwardfromtorchjddirectly.
- BREAKING: Changed the dependencies of
CAGradandNashMTLto be optional when installing TorchJD. Users of these aggregators will have to usepip install "torchjd[cagrad]",pip install "torchjd[nash_mtl]"orpip install "torchjd[full]"to install TorchJD alongside those dependencies. This should make TorchJD more lightweight. - BREAKING: Made the aggregator modules and the
autojacpackage protected. The aggregators must now always be imported via their package (e.g.from torchjd.aggregation.upgrad import UPGradmust be changed tofrom torchjd.aggregation import UPGrad). Thebackwardandmtl_backwardfunctions must now always be imported directly from thetorchjdpackage (e.g.from torchjd.autojac.mtl_backward import mtl_backwardmust be changed tofrom torchjd import mtl_backward). - Removed the check that the input Jacobian matrix provided to an aggregator does not contain
nan,infor-infvalues. This check was costly in memory and in time for large matrices so this should improve performance. However, if the optimization diverges for some reason (for instance due to a too large learning rate), the resulting exceptions may come from other sources. - Removed some runtime checks on the shapes of the internal tensors used by the
autojacengine. This should lead to a small performance improvement.
- Made some aggregators (
CAGrad,ConFIG,DualProj,GradDrop,IMTLG,NashMTL,PCGradandUPGrad) raise aNonDifferentiableErrorwhenever one tries to differentiate through them. Before this change, trying to differentiate through them leaded to wrong gradients or unclear errors.
- Added a
py.typedfile in the top package oftorchjdto ensure compliance with PEP 561. This should make it possible for users to use mypy against the type annotations provided intorchjd.
- Added usage example showing how to combine TorchJD with automatic mixed precision (AMP).
- Refactored the underlying optimization problem that
UPGradandDualProjhave to solve to project onto the dual cone. This should slightly improve the performance and precision of these aggregators. - Refactored internal verifications in the
autojacengine so that they do not run at runtime anymore. This should minimally improve the performance and reduce the memory usage ofbackwardandmtl_backward. - Refactored internal typing in the
autojacengine so that fewer casts are made and so that code is simplified. This should slightly improve the performance ofbackwardandmtl_backward. - Improved the implementation of
ConFIGto be simpler and safer when normalizing vectors. It should slightly improve the performance ofConFIGand minimally affect its behavior. - Simplified the normalization of the Gramian in
UPGrad,DualProjandCAGrad. This should slightly improve their performance and precision.
- Fixed an issue with
backwardandmtl_backwardthat could make the ordering of the columns of the Jacobians non-deterministic, and that could thus lead to slightly non-deterministic results with some aggregators. - Removed arbitrary exception handling in
IMTLGandAlignedMTLwhen the computation fails. In practice, this fix should only affect some matrices with extremely large values, which should not usually happen. - Fixed a bug in
NashMTLthat made it fail (due to a type mismatch) whenupdate_weights_everywas more than 1.
- Added new aggregator
ConFIGfrom ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks.
- Added Python 3.13 classifier in pyproject.toml (we now also run tests on Python 3.13 in the CI).
- Fixed a bug introduced in v0.4.0 that could cause
backwardandmtl_backwardto fail with some tensor shapes.
-
Changed how the Jacobians are computed when calling
backwardormtl_backwardwithparallel_chunk_size=1to not rely ontorch.autograd.vmapin this case. Whenevervmapdoes not support something (compiled functions, RNN on cuda, etc.), users should now be able to avoid usingvmapby callingbackwardormtl_backwardwithparallel_chunk_size=1. -
Changed the effect of the parameter
retain_graphofbackwardandmtl_backward. When set toFalse, it now frees the graph only after all gradients have been computed. In most cases, users should now leave the default valueretain_graph=False, no matter what the value ofparallel_chunk_sizeis. This will reduce the memory overhead.
- RNN training usage example in the documentation.
- Improved the performance of the graph traversal function called by
backwardandmtl_backwardto find the tensors with respect to which differentiation should be done. It now visits every node at most once.
- Added a default value to the
inputsparameter ofbackward. If not provided, theinputswill default to all leaf tensors that were used to compute thetensorsparameter. This is in line with the behavior of torch.autograd.backward. - Added a default value to the
shared_paramsand to thetasks_paramsarguments ofmtl_backward. If not provided, theshared_paramswill default to all leaf tensors that were used to compute thefeatures, and thetasks_paramswill default to all leaf tensors that were used to compute each of thelosses, excluding those used to compute thefeatures. - Note in the documentation about the incompatibility of
backwardandmtl_backwardwith tensors that retain grad.
- BREAKING: Changed the name of the parameter
Atoaggregatorinbackwardandmtl_backward. - BREAKING: Changed the order of the parameters of
backwardandmtl_backwardto make it possible to have a default value forinputsand forshared_paramsandtasks_params, respectively. Usages ofbackwardandmtl_backwardthat rely on the order between arguments must be updated. - Switched to the PEP 735 dependency groups format in
pyproject.toml(from a[tool.pdm.dev-dependencies]to a[dependency-groups]section). This should only affect development dependencies.
- BREAKING: Added a check in
mtl_backwardto ensure thattasks_paramsandshared_paramshave no overlap. Previously, the behavior in this scenario was quite arbitrary.
- PyTorch Lightning integration example.
- Explanation about Jacobian descent in the README.
- Made the dependency on ecos explicit in pyproject.toml
(before
cvxpy1.16.0, it was installed automatically when installingcvxpy).
- Removed upper cap on
numpyversion in the dependencies. This makestorchjdcompatible with the most recent numpy versions too.
- Prevented
IMTLGfrom dividing by zero during its weight rescaling step. If the input matrix consists only of zeros, it will now return a vector of zeros instead of a vector ofnan.
autojacpackage containing the backward pass functions and their dependencies.mtl_backwardfunction to make a backward pass for multi-task learning.- Multi-task learning example.
- BREAKING: Moved the
backwardmodule to theautojacpackage. Some imports may have to be adapted. - Improved documentation of
backward.
- Fixed wrong tensor device with
IMTLGin some rare cases. - BREAKING: Removed the possibility of populating the
.gradfield of a tensor that does not expect it when callingbackward. If an inputtprovided to backward does not satisfyt.requires_grad and (t.is_leaf or t.retains_grad), an error is now raised. - BREAKING: When using
backward, aggregations are now accumulated into the.gradfields of the inputs rather than replacing those fields if they already existed. This is in line with the behavior oftorch.autograd.backward.
- Basic project structure.
aggregationpackage:Aggregatorbase class to aggregate Jacobian matrices.AlignedMTLfrom Independent Component Alignment for Multi-Task Learning.CAGradfrom Conflict-Averse Gradient Descent for Multi-task Learning.Constantto aggregate with constant weights.DualProjadapted from Gradient Episodic Memory for Continual Learning.GradDropfrom Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout.IMTLGfrom Towards Impartial Multi-task Learning.Krumfrom Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent.Meanto average the rows of the matrix.MGDAfrom Multiple-gradient descent algorithm (MGDA) for multiobjective optimization.NashMTLfrom Multi-Task Learning as a Bargaining Game.PCGradfrom Gradient Surgery for Multi-Task Learning.Randomfrom Reasonable Effectiveness of Random Weighting: A Litmus Test for Multi-Task Learning.Sumto sum the rows of the matrix.TrimmedMeanfrom Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates.UPGradfrom Jacobian Descent for Multi-Objective Optimization.
backwardfunction to perform a step of Jacobian descent.- Documentation of the public API and of some usage examples.
- Tests:
- Unit tests.
- Documentation tests.
- Plotting utilities to verify qualitatively that aggregators work as expected.