Our datasets are generated as the following procedures.
We follow DIR setup and report the mean and standard deviation of test accuracy from 5
runs. The running scripts are spmotif-struc.sh
and spmotif-mixed.sh
.
The parameter search spaces are [0.5,1,2,4,8,16,32]
and [0.5,1,2]
for
Experiments on these datasets run on NVIDIA TITAN Xp and NVIDIA RTX 2080Ti graphics cards with CUDA 10.2.
We follow DrugOOD setup and report the mean and standard deviation of test auc-roc from 5
runs. The running script is drugood.sh
.
The parameter search spaces are [0.5,1,2,4,8,16,32]
and [0.5,1,2]
for
The running script is others.sh
. We report the mean and standard deviation of test accuracy from 5
runs.
For CMNIST-sp, the parameter search space is [1,2,4,8,16,32]
for both [1,2,4,6,8]
and [0.5,1,2]
for
We follow size-invariant-GNNs and report the mean and standard deviation of Matthews correlation coefficient from 10
runs. The running script is tu_datasets.sh
. The parameter search space is in [0.5,1,2]
for both
- We suggest re-searching the hyper-parameters from the same space if there is a shift of the software or hardware setups.
- In fact, using a finegrained hyperparemter search space for
$\alpha$ and$\beta$ can obtain better results (cf. the ablation studies in experiments and Appendix G.4). - Using more powerful GNNs as backbones is also likely to improve the performances (cf. the failure case studies in Appendix D.2).
- We also empirically observe high variances for datasets involved with graph size shifts, which aligns with the results of the benchmarking results from GOOD benchamrk. Therefore we believe a lower variance could also be a good indicator for the OOD performance.