From 018428a3ce74ea2a7a9be7248fcdbb8d9431c96f Mon Sep 17 00:00:00 2001 From: Philip Hartout Date: Tue, 7 Oct 2025 14:30:52 +0200 Subject: [PATCH 1/3] add benchmark results and description update to docs --- README.md | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++ docs/index.md | 2 +- 2 files changed, 75 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index cf63e71..d664e4a 100644 --- a/README.md +++ b/README.md @@ -172,3 +172,77 @@ for metric in tqdm(metrics): generated, ) ``` +## Example Benchmark + +The following results mirror the tables from our paper. Bold indicates best, and underlined indicates second-best. Values are multiplied by 100 for legibility. Standard deviations are obtained with subsampling using `StandardPGDInterval` and `MoleculePGDInterval`. Specific parameters are discussed in the paper. + +### Procedural and real-world graphs + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
DatasetModelVUN (↑)PGD (↓)PGD subscores
Clust. (↓)Deg. (↓)GIN (↓)Orb5. (↓)Orb4. (↓)Eig. (↓)
Planar-LAutoGraph85.134.0 ± 1.87.0 ± 2.97.8 ± 3.28.8 ± 3.034.0 ± 1.828.5 ± 1.526.9 ± 2.3
DiGress80.145.2 ± 1.824.8 ± 2.023.3 ± 1.229.0 ± 1.145.2 ± 1.840.3 ± 1.839.4 ± 2.0
GRAN1.699.7 ± 0.299.3 ± 0.298.3 ± 0.398.3 ± 0.399.7 ± 0.199.2 ± 0.298.5 ± 0.4
ESGG93.945.0 ± 1.410.9 ± 3.221.7 ± 3.032.9 ± 2.245.0 ± 1.442.8 ± 1.929.6 ± 1.6
Lobster-LAutoGraph83.118.0 ± 1.64.2 ± 1.912.1 ± 1.614.8 ± 1.518.0 ± 1.616.1 ± 1.613.0 ± 1.1
DiGress91.43.2 ± 2.62.0 ± 1.31.2 ± 1.52.3 ± 2.03.0 ± 3.14.5 ± 2.31.3 ± 1.1
GRAN41.385.4 ± 0.520.8 ± 1.177.1 ± 1.279.8 ± 0.685.4 ± 0.585.0 ± 0.669.8 ± 1.2
ESGG70.969.9 ± 0.60.0 ± 0.063.4 ± 1.166.8 ± 1.069.9 ± 0.666.0 ± 0.651.7 ± 1.8
SBM-LAutoGraph85.65.6 ± 1.50.3 ± 0.66.2 ± 1.46.3 ± 1.33.2 ± 2.24.4 ± 2.02.5 ± 2.2
DiGress73.017.4 ± 2.35.7 ± 2.88.2 ± 3.313.8 ± 1.717.4 ± 2.314.8 ± 2.58.7 ± 3.0
GRAN21.469.1 ± 1.450.2 ± 1.958.6 ± 1.469.1 ± 1.465.7 ± 1.362.8 ± 1.355.9 ± 1.5
ESGG10.499.4 ± 0.297.9 ± 0.597.5 ± 0.698.3 ± 0.496.8 ± 0.489.2 ± 0.799.4 ± 0.2
ProteinsAutoGraph-67.7 ± 7.447.7 ± 5.731.5 ± 8.545.3 ± 5.167.7 ± 7.447.4 ± 7.053.2 ± 6.9
DiGress-88.1 ± 3.136.1 ± 4.329.2 ± 5.023.2 ± 5.388.1 ± 3.160.8 ± 3.623.4 ± 11.8
GRAN-89.7 ± 2.786.0 ± 2.070.6 ± 3.171.5 ± 3.090.4 ± 2.484.4 ± 3.376.7 ± 4.7
ESGG-79.2 ± 4.358.2 ± 3.654.0 ± 3.657.4 ± 4.180.2 ± 3.172.5 ± 3.024.3 ± 11.0
+ +### Molecules +We provide here new benchmark values for molecules. + + + + + + + + + + + + + + + + + + + + + + + + + +
DatasetModelValid (↑)PGD (↓)PGD subscores
Topo (↓)Morgan (↓)ChemNet (↓)MolCLR (↓)Lipinski (↓)
GuacamolAutoGraph91.622.9 ± 0.58.2 ± 0.715.7 ± 0.822.9 ± 0.516.6 ± 0.419.4 ± 0.7
AutoGraph*95.910.4 ± 1.24.3 ± 0.74.7 ± 1.44.6 ± 0.61.7 ± 1.010.4 ± 1.2
DiGress85.232.7 ± 0.519.6 ± 0.620.4 ± 0.532.5 ± 0.722.9 ± 0.632.8 ± 0.5
MosesAutoGraph87.429.6 ± 0.422.4 ± 0.416.3 ± 1.325.8 ± 0.720.5 ± 0.529.6 ± 0.4
DiGress85.733.4 ± 0.526.8 ± 0.424.8 ± 0.829.1 ± 0.624.3 ± 0.733.4 ± 0.5
+ +* AutoGraph* denotes a variant that leverages additional training heuristics as described in the paper. diff --git a/docs/index.md b/docs/index.md index f06f837..b7e06ea 100644 --- a/docs/index.md +++ b/docs/index.md @@ -92,7 +92,7 @@ PGD and its motivation are described in more detail in the paper and API docs. ### Benchmarking snapshot -The table below shows an example benchmark generated with this library across multiple datasets and models. Values illustrate typical outputs from the implemented metrics (VUN, PGD, and PGD subscores). For completeness, this library and our paper also implements and provides various MMD estimates on the datasets below. +The table below shows an example benchmark generated with this library across multiple datasets and models. Values illustrate typical outputs from the implemented metrics (VUN, PGD, and PGD subscores). For completeness, this library and our paper also implements and provides various MMD estimates on the datasets below. Values are scaled by 100 for legibility and subsampling is used to obtain standard deviations. More details are provided in our paper. +The table below shows an example benchmark generated with this library across multiple datasets and models. Values illustrate typical outputs from the newly proposed PolyGraph Discrepancy. For completeness, this library and our paper also implements and provides various MMD estimates on the datasets below. Values are scaled by 100 for legibility and subsampling is used to obtain standard deviations (using `StandardPGDInterval` and `MoleculePGDInterval`). More details are provided in our paper. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
DatasetModelVUN (↑)PGD (↓)PGD subscores
Clust. (↓)Deg. (↓)GIN (↓)Orb5. (↓)Orb4. (↓)Eig. (↓)
Planar-LAutoGraph85.134.0 ± 1.87.0 ± 2.97.8 ± 3.28.8 ± 3.034.0 ± 1.828.5 ± 1.526.9 ± 2.3
DiGress80.145.2 ± 1.824.8 ± 2.023.3 ± 1.229.0 ± 1.145.2 ± 1.840.3 ± 1.839.4 ± 2.0
GRAN1.699.7 ± 0.299.3 ± 0.298.3 ± 0.398.3 ± 0.399.7 ± 0.199.2 ± 0.298.5 ± 0.4
ESGG93.945.0 ± 1.410.9 ± 3.221.7 ± 3.032.9 ± 2.245.0 ± 1.442.8 ± 1.929.6 ± 1.6
Lobster-LAutoGraph83.118.0 ± 1.64.2 ± 1.912.1 ± 1.614.8 ± 1.518.0 ± 1.616.1 ± 1.613.0 ± 1.1
DiGress91.43.2 ± 2.62.0 ± 1.31.2 ± 1.52.3 ± 2.03.0 ± 3.14.5 ± 2.31.3 ± 1.1
GRAN41.385.4 ± 0.520.8 ± 1.177.1 ± 1.279.8 ± 0.685.4 ± 0.585.0 ± 0.669.8 ± 1.2
ESGG70.969.9 ± 0.60.0 ± 0.063.4 ± 1.166.8 ± 1.069.9 ± 0.666.0 ± 0.651.7 ± 1.8
SBM-LAutoGraph85.65.6 ± 1.50.3 ± 0.66.2 ± 1.46.3 ± 1.33.2 ± 2.24.4 ± 2.02.5 ± 2.2
DiGress73.017.4 ± 2.35.7 ± 2.88.2 ± 3.313.8 ± 1.717.4 ± 2.314.8 ± 2.58.7 ± 3.0
GRAN21.469.1 ± 1.450.2 ± 1.958.6 ± 1.469.1 ± 1.465.7 ± 1.362.8 ± 1.355.9 ± 1.5
ESGG10.499.4 ± 0.297.9 ± 0.597.5 ± 0.698.3 ± 0.496.8 ± 0.489.2 ± 0.799.4 ± 0.2
ProteinsAutoGraph67.7 ± 7.447.7 ± 5.731.5 ± 8.545.3 ± 5.167.7 ± 7.447.4 ± 7.053.2 ± 6.9
DiGress88.1 ± 3.136.1 ± 4.329.2 ± 5.023.2 ± 5.388.1 ± 3.160.8 ± 3.623.4 ± 11.8
GRAN89.7 ± 2.786.0 ± 2.070.6 ± 3.171.5 ± 3.090.4 ± 2.484.4 ± 3.376.7 ± 4.7
ESGG79.2 ± 4.358.2 ± 3.654.0 ± 3.657.4 ± 4.180.2 ± 3.172.5 ± 3.024.3 ± 11.0
+ + + Method + Planar-L + Lobster-L + SBM-L + Proteins + Guacamol + Moses + + + + + AutoGraph + 34.0 ± 1.8 + 18.0 ± 1.6 + 5.6 ± 1.5 + 67.7 ± 7.4 + 22.9 ± 0.5 + 29.6 ± 0.4 + + + AutoGraph* + — + — + — + — + 10.4 ± 1.2 + — + + + DiGress + 45.2 ± 1.8 + 3.2 ± 2.6 + 17.4 ± 2.3 + 88.1 ± 3.1 + 32.7 ± 0.5 + 33.4 ± 0.5 + + + GRAN + 99.7 ± 0.2 + 85.4 ± 0.5 + 69.1 ± 1.4 + 89.7 ± 2.7 + — + — + + + ESGG + 45.0 ± 1.4 + 69.9 ± 0.6 + 99.4 ± 0.2 + 79.2 ± 4.3 + — + — + + + + +* AutoGraph* denotes a variant that leverages additional training heuristics as described in the paper. diff --git a/pyproject.toml b/pyproject.toml index bd81b41..e87419f 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta" [project] name = "polygraph-benchmark" -version = "1.0.0" +version = "1.0.1" description = "Evaluation benchmarks for graph generative models" readme = "README.md" authors = [