Skip to content

Commit 480b17b

Browse files
committed
update arxiv links
1 parent d2cba5e commit 480b17b

File tree

10 files changed

+232
-28
lines changed

10 files changed

+232
-28
lines changed

ConFIG/grad_operator.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,8 @@ def ConFIG_update(
8282
length_model (LengthModel, optional): The length model for rescaling the length of the final gradient.
8383
Defaults to ProjectionLength(), which will project each gradient vector onto the final gradient vector to get the final length.
8484
use_latest_square (bool, optional): Whether to use the latest square method for calculating the best direction.
85-
If set to False, we will directly calculate the pseudo-inverse of the gradient matrix. Recommended to set to True. Defaults to True.
85+
If set to False, we will directly calculate the pseudo-inverse of the gradient matrix. See `torch.linalg.pinv` and `torch.linalg.lstsq` for more details.
86+
Recommended to set to True. Defaults to True.
8687
losses (Optional[Sequence], optional): The losses associated with the gradients.
8788
The losses will be passed to the weight and length model. If your weight/length model doesn't require loss information,
8889
you can set this value as None. Defaults to None.
@@ -185,7 +186,8 @@ class ConFIGOperator(GradientOperator):
185186
allow_simplified_model (bool, optional): Whether to allow simplified model for calculating the gradient.
186187
If set to True, will use simplified form of ConFIG method when there are only two losses (ConFIG_update_double). Defaults to True.
187188
use_latest_square (bool, optional): Whether to use the latest square method for calculating the best direction.
188-
If set to False, we will directly calculate the pseudo-inverse of the gradient matrix. Recommended to set to True. Defaults to True.
189+
If set to False, we will directly calculate the pseudo-inverse of the gradient matrix. See `torch.linalg.pinv` and `torch.linalg.lstsq` for more details.
190+
Recommended to set to True. Defaults to True.
189191
190192
Examples:
191193
```python

README.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
> *** Note: The Arxiv paper is being uploaded and this repository is still being edited. ***
21
<h1 align="center">
32
<img src="./docs/assets/config.png" width="400"/>
43
</h1>
@@ -7,7 +6,7 @@
76
<h6 align="center">Towards Conflict-free Training for Everything and Everyone!</h6>
87

98
<p align="center">
10-
[<a href="https://arxiv.org/abs/2312.05320">📄 Research Paper</a>][<a href="https://tum-pbs.github.io/ConFIG/">📖 Documentation & Examples</a>]
9+
[<a href="https://arxiv.org/abs/2408.11104">📄 Research Paper</a>][<a href="https://tum-pbs.github.io/ConFIG/">📖 Documentation & Examples</a>]
1110
</p>
1211

1312
## About
@@ -51,7 +50,7 @@ Then the dot product between $\boldsymbol{g}_{ConFIG}$ and each loss-specific gr
5150

5251
***Abstract:*** The loss functions of many learning problems contain multiple additive terms that can disagree and yield conflicting update directions. For Physics-Informed Neural Networks (PINNs), loss terms on initial/boundary conditions and physics equations are particularly interesting as they are well-established as highly difficult tasks. To improve learning the challenging multi-objective task posed by PINNs, we propose the ConFIG method, which provides conflict-free updates by ensuring a positive dot product between the final update and each loss-specific gradient. It also maintains consistent optimization rates for all loss terms and dynamically adjusts gradient magnitudes based on conflict levels. We additionally leverage momentum to accelerate optimizations by alternating the back-propagation of different loss terms. The proposed method is evaluated across a range of challenging PINN scenarios, consistently showing superior performance and runtime compared to baseline methods. We also test the proposed method in a classic multi-task benchmark, where the ConFIG method likewise exhibits a highly promising performance.
5352

54-
***Read from:*** [[Arxiv](https://arxiv.org/abs/2312.05320)]
53+
***Read from:*** [[Arxiv](https://arxiv.org/abs/2408.11104)]
5554

5655
***Cite as:***
5756

@@ -60,7 +59,7 @@ Then the dot product between $\boldsymbol{g}_{ConFIG}$ and each loss-specific gr
6059
author = {Qiang Liu and Mengyu Chu and Nils Thuerey},
6160
title = {ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks},
6261
year={2024},
63-
url={arXiv XXXX},
62+
url={https://arxiv.org/abs/2408.11104},
6463
}
6564
```
6665

docs/assets/algorithm.png

332 KB
Loading

docs/index.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,15 @@
11
---
22
hide:
3-
- navigation
43
- toc
54
---
6-
**Note: The Arxiv paper is being uploaded and this repository is still being edited.**
75

86
<p align="center">
97
<img src="./assets/config.png" width="400"/>
108
</p>
119
<h4 align="center">Towards Conflict-free Training for Everything and Everyone!</h4>
1210

1311
<p align="center">
14-
[ <a href="https://arxiv.org/abs/2312.05320">📄 Research Paper</a> ][ <a href="https://github.com/tum-pbs/ConFIG"><img src="./assets/github.svg" width="16"> GitHub Repository</a> ]
12+
[ <a href="https://arxiv.org/abs/2408.11104">📄 Research Paper</a> ][ <a href="https://github.com/tum-pbs/ConFIG"><img src="./assets/github.svg" width="16"> GitHub Repository</a> ]
1513
</p>
1614

1715
---
@@ -59,7 +57,7 @@ Then the dot product between $\mathbf{g}_{ConFIG}$ and each loss-specific gradie
5957

6058
***Abstract:*** The loss functions of many learning problems contain multiple additive terms that can disagree and yield conflicting update directions. For Physics-Informed Neural Networks (PINNs), loss terms on initial/boundary conditions and physics equations are particularly interesting as they are well-established as highly difficult tasks. To improve learning the challenging multi-objective task posed by PINNs, we propose the ConFIG method, which provides conflict-free updates by ensuring a positive dot product between the final update and each loss-specific gradient. It also maintains consistent optimization rates for all loss terms and dynamically adjusts gradient magnitudes based on conflict levels. We additionally leverage momentum to accelerate optimizations by alternating the back-propagation of different loss terms. The proposed method is evaluated across a range of challenging PINN scenarios, consistently showing superior performance and runtime compared to baseline methods. We also test the proposed method in a classic multi-task benchmark, where the ConFIG method likewise exhibits a highly promising performance.
6159

62-
***Read from:*** [[Arxiv](https://arxiv.org/abs/2312.05320)]
60+
***Read from:*** [[Arxiv](https://arxiv.org/abs/2408.11104)]
6361

6462
***Cite as:***
6563

@@ -68,7 +66,7 @@ Then the dot product between $\mathbf{g}_{ConFIG}$ and each loss-specific gradie
6866
author = {Qiang Liu and Mengyu Chu and Nils Thuerey},
6967
title = {ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks},
7068
year={2024},
71-
url={arXiv XXXX},
69+
url={https://arxiv.org/abs/2408.11104},
7270
}
7371
```
7472

docs/start.md renamed to docs/start/start.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Get Started
1+
# Quick Start
22

33
## Installation
44

docs/start/theory.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# Theory Introduction
2+
3+
Our ConFIG method aims to eliminate conflicts among multiple loss terms in gradient descent optimizations.
4+
5+
## ConFIG
6+
7+
Generically, we consider an optimization procedure with a set of $m$ individual loss functions, i.e., $\{\mathcal{L}_1,\mathcal{L}_2,\cdots,\mathcal{L}_m\}$. Let $\{\mathbf{g}_1,\mathbf{g}_2, \cdots, \mathbf{g}_m\}$ denote the individual gradients corresponding to each of the loss functions. A gradient-descent step with gradient $\mathbf{g}_c$ will conflict with the decrease of $\mathcal{L}_i$ if $\mathbf{g}_i^\top \mathbf{g}_c$ is **negative**. Thus, to ensure that all losses are decreasing simultaneously along $\mathbf{g}_c$, all $m$ components of $[\mathbf{g}_1,\mathbf{g}_2,\cdots, \mathbf{g}_m]^\top\mathbf{g}_c$ should be positive. This condition is fulfilled by setting $\mathbf{g}_c = [\mathbf{g}_1,\mathbf{g}_2,\cdots, \mathbf{g}_m]^{-\top} \mathbf{w}$, where $\mathbf{w}=[w_1,w_2,\cdots,w_m]$ is a vector with $m$ positive components and $M^{-\top}$ is the pseudoinverse of the transposed matrix $M^{\top}$​.
8+
9+
Although a positive $\mathbf{w}$ vector guarantees a conflict-free update direction for all losses, the specific value of $w_i$ further influences the exact direction of $\mathbf{g}_c$. To facilitate determining $\mathbf{w}$, we reformulate $\mathbf{g}_c$ as $\mathbf{g}_c=k[\mathcal{U}(\mathbf{g}_1),\mathcal{U}(\mathbf{g}_2),\cdots, \mathcal{U}(\mathbf{g}_m)]^{-\top} \mathbf{\hat{w}}$, where $\mathcal{U}(\mathbf{g}_i)=\mathbf{g}_i/(|\mathbf{g}_i|+\varepsilon)$ is a normalization operator and $k>0$. Now, $k$ controls the length of $\mathbf{g}_c$ and the ratio of $\mathbf{\hat{w}}$'s components corresponds to the ratio of $\mathbf{g}_c$'s projections onto each loss-specific $\mathbf{g}_i$, i.e., $|\mathbf{g}_c|\mathcal{S}_c(\mathbf{g},\mathbf{g}_i)$, where $\mathcal{S}_c(\mathbf{g}_i,\mathbf{g}_j)=\mathbf{g}_i^\top\mathbf{g}_j/(|\mathbf{g}_i||\mathbf{g}_j|+\varepsilon)$ is the operator for cosine similarity:
10+
11+
$$
12+
\frac{
13+
|\mathbf{g}_c|\mathcal{S}_c(\mathbf{g}_c,\mathbf{g}_i)
14+
}{
15+
|\mathbf{g}_c|\mathcal{S}_c(\mathbf{g}_c,\mathbf{g}_j)
16+
}
17+
=
18+
\frac{
19+
\mathcal{S}_c(\mathbf{g}_c,\mathbf{g}_i)
20+
}{
21+
\mathcal{S}_c(\mathbf{g}_c,\mathbf{g}_j)
22+
}
23+
=
24+
\frac{
25+
\mathcal{S}_c(\mathbf{g}_c,k\mathcal{U}(\mathbf{g}_i))
26+
}{
27+
\mathcal{S}_c(\mathbf{g}_c,k\mathcal{U}(\mathbf{g}_j))
28+
}
29+
=
30+
\frac{
31+
[k\mathcal{U}(\mathbf{g}_i)]^\top \mathbf{g}_c
32+
}{
33+
[k\mathcal{U}(\mathbf{g}_j)]^\top \mathbf{g}_c
34+
}
35+
=
36+
\frac{\hat{w}_i
37+
}{
38+
\hat{w}_j
39+
}
40+
\quad
41+
\forall i,j \in [1,m].
42+
$$
43+
44+
We call $\mathbf{\hat{w}}$ the **direction weight**. The projection length of $\mathbf{g}_c$ on each loss-specific gradient serves as an effective “learning rate'' for each loss. Here, we choose $\hat{w}_i=\hat{w}_j \ \forall i,j \in [1,m]$ to ensure a uniform decrease rate of all losses, as it was shown to yield a weak form of Pareto optimality for multi-task learning.
45+
46+
47+
Meanwhile, we introduce an adaptive strategy for the length of $\mathbf{g}_c$ rather than directly setting a fixed value of $k$. We notice that the length of $\mathbf{g}_c$ should increase when all loss-specific gradients point nearly in the same direction since it indicates a favorable direction for optimization. Conversely, when loss-specific gradients are close to opposing each other, the magnitude of $\mathbf{g}_c$ should decrease. We realize this by rescaling the length of $\mathbf{g}_c$ to the sum of the projection lengths of each loss-specific gradient on it, i.e., $|\mathbf{g}_c|=\sum_{i=1}^m|\mathbf{g}_i|\mathcal{S}_c(\mathbf{g}_i,\mathbf{g}_c)$.
48+
49+
The procedures above are summarized in the **Con**flict-**F**ree **I**nverse **G**radients (ConFIG) operator $G$ and we correspondingly denote the final update gradient $\mathbf{g}_c$ with $\mathbf{g}_{\text{ConFIG}}$:
50+
51+
$$
52+
\mathbf{g}_{\text{ConFIG}}=\mathcal{G}(\mathbf{g}_1,\mathbf{g}_1,\cdots,\mathbf{g}_m):=\left(\sum_{i=1}^m \mathbf{g}_i^\top\mathbf{g}_u\right)\mathbf{g}_u,
53+
$$
54+
55+
$$
56+
\mathbf{g}_u = \mathcal{U}\left[
57+
[\mathcal{U}(\mathbf{g}_1),\mathcal{U}(\mathbf{g}_2),\cdots, \mathcal{U}(\mathbf{g}_m)]^{-\top} \mathbf{1}_m\right].
58+
$$
59+
60+
Here, $\mathbf{1}_m$ is a unit vector with $m$ components. These two equations are implemented as [ConFIG.grad_operator.ConFIG_update()](../../api/grad_operator/#ConFIG.grad_operator.ConFIG_update) and [ConFIG.grad_operator.ConFIGOperator.calculate_gradient()](../../api/grad_operator/#ConFIG.grad_operator.ConFIGOperator.calculate_gradient). Meanwhile, we also provide [weight_model](../../api/weight_model/) to allow you implement different direction weights ($\hat{\mathbf{w}}=\mathbf{1}_m$ as default) and [length_model](../../api/length_model/) to allow you design different length projection (the above adaptive strategy as default). We encourage you to design and try different weight/length models and compare the result with default configurations.
61+
62+
63+
## ConFIG in two-loss scenario
64+
For the special case of only two loss terms, there is an equivalent form of ConFIG that does not require a pseudoinverse:
65+
66+
$$
67+
\begin{align}
68+
\mathcal{G}(\mathbf{g}_1,\mathbf{g}_2)=(\mathbf{g}_1^\top\mathbf{g}_{v}+\mathbf{g}_2^\top\mathbf{g}_{v}) \mathbf{g}_{v}
69+
\\
70+
\mathbf{g}_{v}=\mathcal{U}\left[\mathcal{U}(\mathcal{O}(\mathbf{g}_1,\mathbf{g}_2))+\mathcal{U}(\mathcal{O}(\mathbf{g}_2,\mathbf{g}_1))\right]
71+
\end{align}
72+
$$
73+
74+
where $\mathcal{O}(\mathbf{g}_1,\mathbf{g}_2)=\mathbf{g}_2-\frac{\mathbf{g}_1^\top\mathbf{g}_2}{|\mathbf{g}1|^2}\mathbf{g}_1$ is the orthogonality operator. It returns a vector orthogonal to $\mathbf{g}_1$ from the plane spanned by $\mathbf{g}_{1}$ and $\mathbf{g}_{2}$.
75+
76+
This equivlance is implemented as [ConFIG.grad_operator.ConFIG_update_double()](../../api/grad_operator/#ConFIG.grad_operator.ConFIG_update_double. You can also set `allow_simplified_model` to true in [ConFIG.grad_operator.ConFIGOperator](../../api/grad_operator/#ConFIG.grad_operator.ConFIGOperator) to enable using this form in two-loss scenario.
77+
78+
## M-ConFIG
79+
80+
Gradient-based methods like the proposed ConFIG method require separate backpropagation steps to compute the gradient for each loss term, which could be computationally expensive. To address this issue, we introduce an accelerated momentum-based variant of ConFIG: **M-ConFIG**. Our core idea is to leverage the momentum of the gradient for the ConFIG operation and update momentum variables in an alternating fashion to avoid backpropagating all losses in a single step. In each iteration, only a single momentum is updated with its corresponding gradient, while the others are carried over from previous steps. Algorithm 1 details the entire procedure of M-ConFIG.
81+
82+
![M-ConFIG](../assets/algorithm.png)
83+
84+
The M-ConFIG method is implemented as [ConFIG.momentum_operator.PseudoMomentumOperator](../../api/momentum_operator/#ConFIG.momentum_operator.PseudoMomentumOperator). This momentum method can also be used for other gradient-based methods. In [ConFIG.momentum_operator.PseudoMomentumOperator](../../api/momentum_operator/#ConFIG.momentum_operator.PseudoMomentumOperator),you can modify the `gradient_operator` parameter to enable momentum acceleration for other methods.
85+
86+
---
87+
88+
For detailed discussion of the background theory, please check our [research paper](https://arxiv.org/abs/2408.11104).

install.sh

100644100755
File mode changed.

mkdocs.yml

Lines changed: 17 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,13 @@ theme:
1111
name: material
1212
features:
1313
#- navigation.sections # Sections are included in the navigation on the left.
14-
- navigation.tabs # Tabs are included in the navigation on the left.
15-
- toc.integrate # Table of contents is integrated on the left; does not appear separately on the right.
14+
#- navigation.tabs # Tabs are included in the navigation on the left.
15+
#- toc.integrate # Table of contents is integrated on the left; does not appear separately on the right.
16+
- toc.follow
1617
- header.autohide # header disappears as you scroll
1718
- navigation.top
1819
- navigation.footer
20+
- navigation.path
1921
palette:
2022
- scheme: default
2123
primary: brown
@@ -83,14 +85,16 @@ plugins:
8385

8486
nav:
8587
- 'Home': 'index.md'
86-
- 'Get Started': 'start.md'
87-
- 'Examples':
88-
- 'Toy Example of Muti-task Learning': 'examples/mtl_toy.ipynb'
89-
- "Solve Burgers' Equation with PINN": 'examples/pinn_burgers.ipynb'
90-
- 'API Reference':
91-
- "Gradient Operator": 'api/grad_operator.md'
92-
- "Momentum Operator": 'api/momentum_operator.md'
93-
- "Weight Model": 'api/weight_model.md'
94-
- "Length Model": 'api/length_model.md'
95-
- "Loss Recorder": 'api/loss_recorder.md'
96-
- "Utils": 'api/utils.md'
88+
- '1. Get Started':
89+
- '1.1. Quick Start': 'start/start.md'
90+
- '1.2. Theory Introduction': 'start/theory.md'
91+
- '2. Examples':
92+
- '2.1. Toy Example of Muti-task Learning': 'examples/mtl_toy.ipynb'
93+
- "2.2. Solve Burgers' Equation with PINN": 'examples/pinn_burgers.ipynb'
94+
- '3. API Reference':
95+
- "3.1. Gradient Operator": 'api/grad_operator.md'
96+
- "3.2. Momentum Operator": 'api/momentum_operator.md'
97+
- "3.3. Weight Model": 'api/weight_model.md'
98+
- "4.4. Length Model": 'api/length_model.md'
99+
- "4.5. Loss Recorder": 'api/loss_recorder.md'
100+
- "4.6. Utils": 'api/utils.md'

0 commit comments

Comments
 (0)