Experiments for Flood complex

This repository contains the code used for the experiments in the NeurIPS 2025 paper: "The Flood Complex: Large-Scale Persistent Homology on Millions of Points". 📄

Please cite as:

@inproceedings{graf2025floodcomplex,
      title={The Flood Complex: Large-Scale Persistent Homology on Millions of Points}, 
      author={Graf, Florian and Pellizzoni, Paolo and Uray, Martin and Huber, Stefan and Kwitt, Roland},
      year={2025},
      booktitle={NeurIPS},
}

See the flooder GitHub page for the official implementation of the Flood Complex.

Setup

In the following, we assume that the repository has been cloned into /tmp/flooder-experiments.

Setup a new Anaconda environment

conda create -n "flooder-experiments" python=3.9
conda activate flooder-experiments

Installing `flooder`

First, install our implementation of the flood complex. This already installs pytorch and most dependencies required to run the experiments.

pip install flooder

Installing `torchph`

git clone https://github.com/c-hofer/torchph.git 
conda develop  ../torchph/
python -c 'import torchph' # check

Installing additional dependencies

pip install matplotlib
pip install flaml[automl]

Scalability

Experiments on the scalability of the Flood complex can be found in the scalability/ folder.

We can reproduce the results of Figure 4 by running:

python scalability/synthetic.py --run-alpha --outdir output/

Moreover, experiments on the large scale point clouds can be run using:

python scalability/large_scale.py --root bench_dataset/ --idx 0 --outdir output/

Reproducing ML results

All code needed to reproduce the results on the application of the Flood complex to machine learning can be found in the learning/ folder.

Obtaining PH diagrams

We can run the Flood complex on a dataset as follows:

python learning/ph_flood.py --dataset coral --root data/coral/

Similarly, we can subsample 20k points from the point clouds and run the Alpha complex:

python learning/ph_alpha.py --dataset coral --root data/coral/ --num-points 20000

for s in {1..5}; do python learning/ph_alpha.py --dataset coral --root data/coral/ --num-points 4000 --seed "$s"; done

Since point clouds in the rocks dataset vary in size, we subsample as a fraction of their size:

python learning/ph_alpha.py --dataset rocks --root data/rocks/ --scale-num-points 0.01

Using PH diagrams for classification and regression

We can then run the classification pipeline:

python learning/ph_classify.py --dataset coral --root data/coral/ --phdir data/coral/floodph/ --flaml-classifier lrl1 --stretch-quantile 0.05 --stats-file ./output/flood_corals_600s_stretch0.05.yaml --time-budget 30

python learning/ph_classify.py --dataset coral --root data/coral/ --phdir data/coral/alphaph_20000_0/ --flaml-classifier lrl1 --stretch-quantile 0.05 --stats-file ./output/flood_corals_600s_stretch0.05.yaml --time-budget 30

python learning/ph_classify_subsample.py --dataset coral --root data/coral/ --phdirs data/coral/alphaph_4000_1/ data/coral/alphaph_4000_2/ data/coral/alphaph_4000_3/ data/coral/alphaph_4000_4/ data/coral/alphaph_4000_5/ --flaml-classifier lrl1 --stretch-quantile 0.05 --stats-file ./output/avg_corals_600s_stretch0.05.yaml --time-budget 30

Similarly, for classification with a LGBM we run (e.g. for Flood PH):

python learning/ph_classify.py --dataset mcb --root data/mcb/ --phdir data/mcb/floodph/ --flaml-classifier lgbm --stretch-quantile 0.05 --stats-file ./output/flood_mcb_600s_stretch0.05.yaml --time-budget 30

And for regression (e.g. for Flood PH):

python learning/ph_regression.py --dataset rocks --root data/rocks/ --phdir data/rocks/floodph/   --flaml-classifier lgbm --stretch-quantile 0.05 --stats-file ./output/flood_mcb_600s_stretch0.05.yaml --time-budget 30

Neural network baselines

pip install learning/baselines/pointnet2_ops_lib --no-build-isolation
pip install torch-geometric
pip install torch-cluster
pip install einops
pip install ninja
pip install timm

We run neural network baselines to compare to PH-based methods. In particular, we use pointnet++, pvt and pointmlp as baselines.

python learning/nn_baselines_cls.py --model pointnet++ --dataset mcb --root data/mcb/ --data-augmentation --lr 0.001 --stats-file output_nn/mcb_pointnet.yaml

For the rocks dataset, we sample points from a contiguous region rather than using random sampling:

python learning/nn_baselines_cls.py --model pointnet++ --dataset rocks --root data/rocks/ --from-corner --lr 0.001 --stats-file output/rocks_pointnet.yaml

Scalability/accuracy trade offs on swisscheese

We can reproduce the results of Figure 5 by running:

python learning/cheese_sweep.py --root data/sweep/ --output output/

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
learning		learning
output		output
scalability		scalability
.gitignore		.gitignore
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Experiments for Flood complex

Setup

Setup a new Anaconda environment

Installing `flooder`

Installing `torchph`

Installing additional dependencies

Scalability

Reproducing ML results

Obtaining PH diagrams

Using PH diagrams for classification and regression

Neural network baselines

Scalability/accuracy trade offs on swisscheese

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Experiments for Flood complex

Setup

Setup a new Anaconda environment

Installing flooder

Installing torchph

Installing additional dependencies

Scalability

Reproducing ML results

Obtaining PH diagrams

Using PH diagrams for classification and regression

Neural network baselines

Scalability/accuracy trade offs on swisscheese

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Installing `flooder`

Installing `torchph`

Packages