Skip to content

Code for running baseline models/experiments with the Fields of The World dataset

License

Notifications You must be signed in to change notification settings

fieldsoftheworld/ftw-baselines

Repository files navigation

Fields of The World (FTW) - Baselines Codebase

Fields of The World (FTW) is a large-scale benchmark dataset designed to advance machine learning models for instance segmentation of agricultural field boundaries. This dataset supports the need for accurate and scalable field boundary data, which is essential for global agricultural monitoring, land use assessments, and environmental studies.

This repository provides the codebase for working with the FTW dataset, including tools for data pre-processing, model training, and evaluation.

Table of Contents

Folder structure

Fields-of-The-World
├── .flake8
├── .gitignore
├── CHANGELOGS.md
├── LICENSE
├── README.md
├── assets
├── configs
│   └── example_config.yaml
├── environment.yml
├── inference.py
├── notebooks
│   └── visualize_dataset.ipynb
├── pyproject.toml
└── src
   ├── ftw
   │   ├── __init__.py
   │   ├── datamodules.py
   │   ├── datasets.py
   │   ├── metrics.py
   │   ├── trainers.py
   │   └── utils.py
   └── ftw_cli
       ├── __init__.py
       ├── cli.py
       ├── download.py
       ├── model.py
       └── unpack.py

System setup

Create Conda/Mamba environment

To set up the environment using the provided env.yml file:

mamba env create -f env.yml
mamba activate ftw

Verify PyTorch installation and CUDA availability

Verify that PyTorch and CUDA are installed correctly (if using a GPU):

python -c "import torch; print(torch.cuda.is_available())"

Setup FTW CLI

This creates the ftw command-line tool, which is used to download and unpack the data.

pip install -e .
Usage: ftw [OPTIONS] COMMAND [ARGS]...

  Fields of The World (FTW) - Command Line Interface

Options:
  --help  Show this message and exit.

Commands:
  download  Download the FTW dataset.
  model     Model-related commands.
  unpack    Unpack the downloaded FTW dataset.

Dataset setup

Download the dataset using the FTW Cli, root_folder defaults to ./data and clean_download is to freshly download the entire dataset(deletes default local folder):

ftw download --help
Usage: ftw download [OPTIONS]

  Download the FTW dataset.

Options:
  --clean_download    If set, the script will delete the root folder before
                      downloading.
  --root_folder TEXT  Root folder where the files will be downloaded. Defaults
                      to './data'.
  --countries TEXT    Comma-separated list of countries to download. If 'all'
                      is passed, downloads all available countries.
  --help              Show this message and exit.

Unpack the dataset using the unpack.py script, this will create a ftw folder under the data after unpacking.

ftw unpack --help
    Usage: ftw unpack [OPTIONS]

  Unpack the downloaded FTW dataset.

Options:
  --root_folder TEXT  Root folder where the .zip files are located. Defaults
                      to './data'.
  --help              Show this message and exit.

Examples:

To download and unpack the complete dataset use following commands:

ftw download 
ftw unpack

To download and unpack the specific set of countries use following commands:

ftw download --countries belgium,kenya,vietnam
ftw unpack

Note: Make sure to avoid adding any space in between the list of comma seperated countries.

Dataset visualization

Explore visualize_dataset.ipynb to know more about the dataset.

Sample 1 Sample 2

Pre-requisites for experimentation

Before running experiments, make sure to create configuration files in the configs directory. These files should specify the root directory of the dataset. Additionally, update the root argument in datasets.py to reflect the correct dataset path.

example_config gives an idea of the parameters that can be changed to spin out experiments.

trainer:
  max_epochs: <E.G. 100, NUMBER OF EPOCHS>
  log_every_n_steps: <E.G. 10, LOGGING FREQUENCY>
  accelerator: <E.G. "gpu", ACCELERATOR>
  default_root_dir: <LOGS DIRECTORY>
  devices:
    - <DEVICE ID>
  callbacks:
    - class_path: lightning.pytorch.callbacks.ModelCheckpoint
      init_args:
        monitor: val_loss
        mode: min
        save_top_k: <E.G. 0, NUMBER OF MODELS TO SAVE>
        save_last: <TRUE / FALSE, WHETHER TO SAVE THE LAST MODEL OR NOT>
        filename: "{epoch}-{val_loss:.2f}"
model:
  class_path: ftw.trainers.CustomSemanticSegmentationTask
  init_args:
    loss: <E.G. "jaccard", LOSS FUNCTION>
    model: <E.G. "unet", MODEL>
    backbone: <E.G. "efficientnet-b3", BACKBONE MODEL>
    weights: <TRUE / FALSE, WHETHER TO USE PRETRAINED WEIGHTS OR NOT>
    patch_weights : <TRUE / FALSE, WHETHER TO PATCH THE WEIGHTS IN A CUSTOM FORMAT OR NOT>
    in_channels: <E.G. 8, NUMBER OF INPUT CHANNELS>
    num_classes: <E.G. 3, NUMBER OF CLASSES>
    num_filters: <E.G. 64, NUMBER OF FILTERS>
    ignore_index: null
    lr: <E.G. 1e-3, LEARNING RATE>
    patience: <E.G. 100, PATIENCE FOR COSINE ANNEALING>
data:
  class_path: ftw.datamodules.FTWDataModule
  init_args:
    batch_size: 32
    num_workers: 8
    num_samples: -1
    train_countries:
      - country 1
      - country 2
    val_countries:
      - country 1
      - country 2
    test_countries:
      - country 1
      - country 2
  dict_kwargs:
    root: <ROOT FOLDER OF THE DATASET>
    load_boundaries: <TRUE / FALSE WHETHER TO LOAD 3 CLASS MASKS OR NOT>
seed_everything: <SEED VALUE>

Experimentation

This section provides guidelines for running model training, testing, and experimentation using multiple GPUs and configuration files.

ftw model --help
  Usage: ftw model [OPTIONS] COMMAND [ARGS]...

  Model-related commands.

Options:
  --help  Show this message and exit.

Commands:
  fit   Fit the model
  test  Test the model

Training

We use LightningCLI to streamline the training process, leveraging configuration files to define the model architecture, dataset, and training parameters.

To train a model from scratch:

ftw model fit --help

Usage: ftw model fit [OPTIONS] [CLI_ARGS]...

  Fit the model

Options:
  --config PATH  Path to the config file (required)  [required]
  --help         Show this message and exit.

You can train your model using a configuration file as follows:

ftw model fit --config configs/example_config.yaml

To resume training from a checkpoint:

If training has been interrupted or if you wish to fine-tune a pre-trained model, you can resume training from a checkpoint:

ftw model fit --config configs/example_config.yaml --ckpt_path <Checkpoint File Path>

Testing

Once your model has been trained, you can evaluate it on the test set specified in your datamodule. This can be done using the same configuration file used for training.

ftw model test --help

Usage: ftw model test [OPTIONS] [CLI_ARGS]...

  Test the model

Options:
  --checkpoint_fn TEXT        Path to model checkpoint  [required]
  --root_dir TEXT             Root directory of dataset
  --gpu INTEGER               GPU to use
  --countries TEXT            Countries to evaluate on  [required]
  --postprocess               Apply postprocessing to the model output
  --iou_threshold FLOAT       IoU threshold for matching predictions to ground
                              truths
  --output_fn TEXT            Output file for metrics
  --model_predicts_3_classes  Whether the model predicts 3 classes or 2
                              classes
  --test_on_3_classes         Whether to test on 3 classes or 2 classes
  --temporal_options TEXT     Temporal option (stacked, windowA, windowB,
                              etc.)
  --help                      Show this message and exit.

To test a model:

Using FTW cli commands to test the model, you can pass specific options, such as selecting the GPUs, providing checkpoints, specifying countries for testing, and postprocessing results:

ftw model test --gpu 0 --checkpoint_fn logs/path_to_model/checkpoints/last.ckpt --countries denmark finland --postprocess --output_fn results.csv

This will output test results into results.csv after running on the selected GPUs and processing the specified countries.

Note: If data directory path is custom (not default ./data/) then make sure to pass custom data directory path in testing using --root_dir custom_dir/ftw.

Parallel experimentation

For running multiple experiments across different GPUs in parallel, the provided Python script run_experiments.py can be used. It efficiently manages and distributes training tasks across available GPUs by using multiprocessing and queuing mechanisms.

To run experiments in parallel:

  1. Define the list of experiment configuration files in the experiment_configs list.
  2. Specify the list of GPUs in the GPUS variable (e.g., [0,1,2,3]).
  3. Set DRY_RUN = False to execute the experiments.

The script automatically detects the available GPUs and runs the specified experiments on them. Each experiment will use the configuration file specified in experiment_configs.

python run_experiments.py

The script will distribute the experiments across the specified GPUs using a queue, with each experiment running on the corresponding GPU.

Notes

If you see any warnings in this format,

/home/byteboogie/miniforge3/envs/ftw/lib/python3.12/site-packages/kornia/feature/lightglue.py:44: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)

this is due to a PR in official PyTorch PyTorch 2.4 deprecated the use of torch.cuda.amp.autocast in favor of torch.amp.autocast("cuda", ...), but this change has missed updating internal uses in PyTorch Link, rest assured ftw won't face any issue in experimentation and dataset exploration.

Contributing

We welcome contributions! Please fork the repository, make your changes, and submit a pull request. For any issues, feel free to open an issue ticket.

License

This codebase is released under the MIT License. See the LICENSE file for details.

About

Code for running baseline models/experiments with the Fields of The World dataset

Resources

License

Stars

Watchers

Forks