Merge branch 'main' of github.com:geometric-intelligence/TopoBenchmar…

…k into dev
geometric-intelligence · Dec 18, 2024 · ecdd658 · ecdd658
2 parents 30547f3 + 48f9fcf
commit ecdd658
Show file tree

Hide file tree

Showing 11 changed files with 282 additions and 33 deletions.
diff --git a/.gitattributes b/.gitattributes
@@ -0,0 +1 @@
+*.ipynb linguist-vendored
diff --git a/README.md b/README.md
@@ -92,6 +92,8 @@ python -m topobenchmark model=cell/cwn dataset=graph/MUTAG
 
 The same CLI override mechanism also applies when modifying more finer configurations within a `CONFIG GROUP`. Please, refer to the official [`hydra`documentation](https://hydra.cc/docs/intro/) for further details.
 
+
+
 ## :bike: Experiments Reproducibility
 To reproduce Table 1 from the [`TopoBenchmark: A Framework for Benchmarking Topological Deep Learning`](https://arxiv.org/pdf/2406.06642) paper, please run the following command:
 
@@ -116,6 +118,7 @@ We list the neural networks trained and evaluated by `TopoBenchmark`, organized
 | GAT | [Graph Attention Networks](https://openreview.net/pdf?id=rJXMpikCZ) |
 | GIN | [How Powerful are Graph Neural Networks?](https://openreview.net/pdf?id=ryGs6iA5Km) |
 | GCN | [Semi-Supervised Classification with Graph Convolutional Networks](https://arxiv.org/pdf/1609.02907v4) |
+| GraphMLP | [Graph-MLP: Node Classification without Message Passing in Graph](https://arxiv.org/pdf/2106.04051) |
 
 ### Simplicial complexes
 | Model | Reference |
@@ -145,7 +148,7 @@ We list the neural networks trained and evaluated by `TopoBenchmark`, organized
 ### Combinatorial complexes
 | Model | Reference |
 | --- | --- |
-| GCCN | [Generalized Combinatorial Complex Neural Networks](https://arxiv.org/pdf/2410.06530) |
+| GCCN | [TopoTune: A Framework for Generalized Combinatorial Complex Neural Networks](https://arxiv.org/pdf/2410.06530) |
 
 ## :bulb: TopoTune
 
@@ -178,12 +181,17 @@ python -m topobenchmark \
 
 To use a single augmented Hasse graph expansion, use `model={domain}/topotune_onehasse` instead of `model={domain}/topotune`.
 
-To specify a set of neighborhoods (routes) on the complex, use a list of neighborhoods each specified as `\[\[{source_rank}, {destination_rank}\], {neighborhood}\]`. Currently, the following options for `{neighborhood}` are supported:
-- `up_laplacian`, from rank $r$ to $r$
-- `down_laplacian`, from rank $r$ to $r$
-- `boundary`, from rank $r$ to $r-1$
-- `coboundary`, from rank $r$ to $r+1$
-- `adjacency`, from rank $r$ to $r$ (stand-in for `up_adjacency`, as `down_adjacency` not yet supported in TopoBenchmark)
+To specify a set of neighborhoods on the complex, use a list of neighborhoods each specified as a string of the form 
+`r-{neighborhood}-k`, where $k$ represents the source cell rank, and $r$ is the number of ranks up or down that the selected `{neighborhood}` considers. Currently, the following options for `{neighborhood}` are supported:
+- `up_laplacian`, between cells of rank $k$ through $k+r$ cells.
+- `down_laplacian`, between cells of rank $k$ through $k-r$ cells.
+- `hodge_laplacian`, between cells of rank $k$ through both $k-r$ and $k+r$ cells.
+- `up_adjacency`, between cells of rank $k$ through $k+r$ cells.
+- `down_adjacency`, between cells of rank $k$ through $k-r$ cells.
+- `up_incidence`, from rank $k$ to $k+r$.
+- `down_incidence`, from rank $k$ to $k-r$.
+
+The number $r$ can be omitted, in which case $r=1$ by default (e.g. `up_incidence-k` represents the incidence from rank $k$ to $k+1$).
 
 
 ### Using backbone models from any package
@@ -235,16 +243,18 @@ We list the liftings used in `TopoBenchmark` to transform datasets. Here, a _lif
 
 </details>
 
-## Data Transformations
+<details>
+  <summary><b> Data Transformations <b></summary>
 
 | Transform | Description | Reference |
 | --- | --- | --- |
 | Message Passing Homophily | Higher-order homophily measure for hypergraphs | [Source](https://arxiv.org/abs/2310.07684) |
 | Group Homophily | Higher-order homophily measure for hypergraphs that considers groups of predefined sizes  | [Source](https://arxiv.org/abs/2103.11818) |
+</details>
 
 ## :books: Datasets
 
-
+### Graphs
 | Dataset | Task | Description | Reference |
 | --- | --- | --- | --- |
 | Cora | Classification | Cocitation dataset. | [Source](https://link.springer.com/article/10.1023/A:1009953814988) |
@@ -264,14 +274,14 @@ We list the liftings used in `TopoBenchmark` to transform datasets. Here, a _lif
 | US-county-demos | Regression | In turn each node attribute is used as the target label. | [Source](https://arxiv.org/pdf/2002.08274) |
 | ZINC | Regression | Graph-level regression. | [Source](https://pubs.acs.org/doi/10.1021/ci3001277) |
 
-
-
-
-## :hammer_and_wrench: Development
-
-To join the development of `TopoBenchmark`, you should install the library in dev mode. 
-
-For this, you can create an environment using conda or docker. Please, follow the steps in <a href="#jigsaw-get-started">:jigsaw: Get Started</a>.
+### Hypergraphs
+| Dataset | Task | Description | Reference |
+| --- | --- | --- | --- |
+| Cora-Cocitation | Classification | Cocitation dataset. | [Source](https://proceedings.neurips.cc/paper_files/paper/2019/file/1efa39bcaec6f3900149160693694536-Paper.pdf) |
+| Citeseer-Cocitation | Classification | Cocitation dataset. | [Source](https://proceedings.neurips.cc/paper_files/paper/2019/file/1efa39bcaec6f3900149160693694536-Paper.pdf) |
+| PubMed-Cocitation | Classification | Cocitation dataset. | [Source](https://proceedings.neurips.cc/paper_files/paper/2019/file/1efa39bcaec6f3900149160693694536-Paper.pdf) |
+| Cora-Coauthorship | Classification | Cocitation dataset. | [Source](https://proceedings.neurips.cc/paper_files/paper/2019/file/1efa39bcaec6f3900149160693694536-Paper.pdf) |
+| DBLP-Coauthorship | Classification | Cocitation dataset. | [Source](https://proceedings.neurips.cc/paper_files/paper/2019/file/1efa39bcaec6f3900149160693694536-Paper.pdf) |
 
 
 

diff --git a/configs/evaluator/default.yaml b/configs/evaluator/default.yaml
@@ -6,5 +6,5 @@ num_classes: ${dataset.parameters.num_classes}
 # Automatically selects the default metrics depending on the task
 # Classification: [accuracy, precision, recall, auroc]
 # Regression: [mae, mse]
-metrics: ${get_default_metrics:${evaluator.task}}
+metrics: ${get_default_metrics:${evaluator.task},${oc.select:dataset.parameters.metrics,null}}
 # Select classification/regression config files to manually define the metrics
diff --git a/configs/run.yaml b/configs/run.yaml
@@ -4,8 +4,8 @@
 # order of defaults determines the order in which configs override each other
 defaults:
   - _self_
-  - dataset: graph/cocitation_cora
-  - model: graph/gcn_dgm
+  - dataset: graph/ZINC
+  - model: cell/topotune
   - transforms: ${get_default_transform:${dataset},${model}}  #tree #${get_default_transform:${dataset},${model}} #no_transform 
   - optimizer: default
   - loss: default

diff --git a/test/evaluator/test_evaluator.py b/test/evaluator/test_evaluator.py
@@ -1,15 +1,43 @@
 """ Test the TBEvaluator class."""
 import pytest
-
+import torch
 from topobenchmark.evaluator import TBEvaluator
 
 class TestTBEvaluator:
     """ Test the TBXEvaluator class."""
 
     def setup_method(self):
         """ Setup the test."""
-        self.evaluator_multilable = TBEvaluator(task="multilabel classification")
-        self.evaluator_regression = TBEvaluator(task="regression")
+        self.classification_metrics = ["accuracy", "precision", "recall", "auroc"]
+        self.evaluator_classification = TBEvaluator(task="classification", num_classes=3, metrics=self.classification_metrics)
+        self.evaluator_multilabel = TBEvaluator(task="multilabel classification", num_classes=2, metrics=self.classification_metrics)
+        self.regression_metrics = ["example", "mae"]
+        self.evaluator_regression = TBEvaluator(task="regression", num_classes=1, metrics=self.regression_metrics)
         with pytest.raises(ValueError):
-            TBEvaluator(task="wrong")
-        repr = self.evaluator_multilable.__repr__()
+            TBEvaluator(task="wrong", num_classes=2, metrics=self.classification_metrics)
+
+    def test_repr(self):
+        """Test the __repr__ method."""
+        assert "TBEvaluator" in self.evaluator_classification.__repr__()
+        assert "TBEvaluator" in self.evaluator_multilabel.__repr__()
+        assert "TBEvaluator" in self.evaluator_regression.__repr__()
+
+    def test_update_and_compute(self):
+        """Test the update and compute methods."""
+        self.evaluator_classification.update({"logits": torch.randn(10, 3), "labels": torch.randint(0, 3, (10,))})
+        out = self.evaluator_classification.compute()
+        for metric in self.classification_metrics:
+            assert metric in out
+        self.evaluator_multilabel.update({"logits": torch.randn(10, 2), "labels": torch.randint(0, 2, (10, 2))})
+        out = self.evaluator_multilabel.compute()
+        for metric in self.classification_metrics:
+            assert metric in out
+        self.evaluator_regression.update({"logits": torch.randn(10, 1), "labels": torch.randn(10,)})
+        out = self.evaluator_regression.compute()
+        for metric in self.regression_metrics:
+            assert metric in out
+
+    def test_reset(self):
+        """Test the reset method."""
+        self.evaluator_multilabel.reset()
+        self.evaluator_regression.reset()
diff --git a/test/utils/test_config_resolvers.py b/test/utils/test_config_resolvers.py
@@ -117,6 +117,9 @@ def test_infer_num_cell_dimensions(self):
 
     def test_get_default_metrics(self):
         """Test get_default_metrics."""
+        out = get_default_metrics("classification", ["accuracy", "precision"])
+        assert out == ["accuracy", "precision"]
+
         out = get_default_metrics("classification")
         assert out == ["accuracy", "precision", "recall", "auroc"]
 

diff --git a/topobenchmark/evaluator/__init__.py b/topobenchmark/evaluator/__init__.py
@@ -3,6 +3,8 @@
 from torchmetrics.classification import AUROC, Accuracy, Precision, Recall
 from torchmetrics.regression import MeanAbsoluteError, MeanSquaredError
 
+from .metrics import ExampleRegressionMetric
+
 # Define metrics
 METRICS = {
     "accuracy": Accuracy,
@@ -11,6 +13,7 @@
     "auroc": AUROC,
     "mae": MeanAbsoluteError,
     "mse": MeanSquaredError,
+    "example": ExampleRegressionMetric,
 }
 
 from .base import AbstractEvaluator  # noqa: E402

diff --git a/topobenchmark/evaluator/evaluator.py b/topobenchmark/evaluator/evaluator.py
@@ -37,14 +37,15 @@ def __init__(self, task, **kwargs):
         elif self.task == "multilabel classification":
             parameters = {"num_classes": kwargs["num_classes"]}
             parameters["task"] = "multilabel"
+            parameters["num_labels"] = kwargs["num_classes"]
             metric_names = kwargs["metrics"]
 
         elif self.task == "regression":
             parameters = {}
             metric_names = kwargs["metrics"]
 
         else:
-            raise ValueError(f"Invalid task {kwargs['task']}")
+            raise ValueError(f"Invalid task {task}")
 
         metrics = {}
         for name in metric_names:
@@ -83,7 +84,10 @@ def update(self, model_out: dict):
         if self.task == "regression":
             self.metrics.update(preds, target.unsqueeze(1))
 
-        elif self.task == "classification":
+        elif (
+            self.task == "classification"
+            or self.task == "multilabel classification"
+        ):
             self.metrics.update(preds, target)
 
         else:

diff --git a/topobenchmark/evaluator/metrics/__init__.py b/topobenchmark/evaluator/metrics/__init__.py
@@ -0,0 +1,108 @@
+"""Init file for custom metrics in evaluator module."""
+
+import importlib
+import inspect
+import sys
+from pathlib import Path
+from typing import Any
+
+
+class LoadManager:
+    """Manages automatic discovery and registration of loss classes."""
+
+    @staticmethod
+    def is_metric_class(obj: Any) -> bool:
+        """Check if an object is a valid metric class.
+
+        Parameters
+        ----------
+        obj : Any
+            The object to check if it's a valid loss class.
+
+        Returns
+        -------
+        bool
+            True if the object is a valid loss class (non-private class
+            with 'FeatureEncoder' in name), False otherwise.
+        """
+        try:
+            from torchmetrics import Metric
+
+            return (
+                inspect.isclass(obj)
+                and not obj.__name__.startswith("_")
+                and issubclass(obj, Metric)
+                and obj is not Metric
+            )
+        except ImportError:
+            return False
+
+    @classmethod
+    def discover_metrics(cls, package_path: str) -> dict[str, type]:
+        """Dynamically discover all metric classes in the package.
+
+        Parameters
+        ----------
+        package_path : str
+            Path to the package's __init__.py file.
+
+        Returns
+        -------
+        Dict[str, Type]
+            Dictionary mapping loss class names to their corresponding class objects.
+        """
+        metrics = {}
+        package_dir = Path(package_path).parent
+
+        # Add parent directory to sys.path to ensure imports work
+        parent_dir = str(package_dir.parent)
+        if parent_dir not in sys.path:
+            sys.path.insert(0, parent_dir)
+
+        # Iterate through all .py files in the directory
+        for file_path in package_dir.glob("*.py"):
+            if file_path.stem == "__init__":
+                continue
+
+            try:
+                # Use importlib to safely import the module
+                module_name = f"{package_dir.stem}.{file_path.stem}"
+                module = importlib.import_module(module_name)
+
+                # Find all loss classes in the module
+                for name, obj in inspect.getmembers(module):
+                    if (
+                        cls.is_metric_class(obj)
+                        and obj.__module__ == module.__name__
+                    ):
+                        metrics[name] = obj  # noqa: PERF403
+
+            except ImportError as e:
+                print(f"Could not import module {module_name}: {e}")
+
+        return metrics
+
+
+# Dynamically create the loss manager and discover losses
+manager = LoadManager()
+CUSTOM_METRICS = manager.discover_metrics(__file__)
+CUSTOM_METRICS_list = list(CUSTOM_METRICS.keys())
+
+# Combine manual and discovered losses
+all_metrics = {**CUSTOM_METRICS}
+
+# Generate __all__
+__all__ = [
+    "CUSTOM_METRICS",
+    "CUSTOM_METRICS_list",
+    *list(all_metrics.keys()),
+]
+
+# Update locals for direct import
+locals().update(all_metrics)
+
+# from .example import ExampleRegressionMetric
+
+# __all__ = [
+#     "ExampleRegressionMetric",
+# ]