Robots are frequently tasked to gather relevant sensor data in unknown terrains. A key challenge for classical path planning algorithms used for autonomous information gathering is adaptively replanning paths online as the terrain is explored given limited onboard compute resources. Recently, learning-based approaches emerged that train planning policies offline and enable computationally efficient online replanning performing policy inference. These approaches are designed and trained for terrain monitoring missions assuming a single specific map representation, which limits their applicability to different terrains. To address these issues, we propose a novel formulation of the adaptive informative path planning problem unified across different map representations, enabling training and deploying planning policies in a larger variety of monitoring missions. Experimental results validate that our novel formulation easily integrates with classical non-learning-based planning approaches while maintaining their performance. Our trained planning policy performs similarly to state-of-the-art map-specifically trained policies. We validate our learned policy on unseen real-world terrain datasets.
The paper can be found here.
If you found this work useful for your own research, feel free to cite it.
@article{rueckin2025arxiv,
title={{Towards Map-Agnostic Policies for Adaptive Informative Path Planning}},
author={R{\"u}ckin, Julius and Morilla-Cabello, David and Stachniss, Cyrill and Montijano, Eduardo and Popovi{\'c}, Marija},
journal={arXiv preprint, arXiv:2410.17166},
year={2024},
}
We argue that the broad pool of existing adaptive informative path planning (IPP) approaches should be viewed along two dimensions: the map-specific formulation modelling the adaptive IPP problem and the algorithm used to offline-train or online-search the planning policy. The formulation of the adaptive IPP problem is the most critical design decision to ensure the unified applicability of planning policies across various terrain monitoring missions. This motivates the need for a map-agnostic formulation of the adaptive IPP terrain monitoring problem that directly integrates with any (non)-learning-based policy search algorithm used for adaptive IPP. Particularly, this formulation ensures training and deploying learned policies in largely varying terrain monitoring missions.
The main contribution of this paper is such a novel map-agnostic formulation of the adaptive IPP problem for terrain monitoring illustrated in the figure below. Our formulation unifies continuous-valued, i.e. regression, and discrete-valued, i.e. classification, terrain feature monitoring for adaptive IPP policies. To achieve this, we unify state space representations across terrain map representations utilised for replanning online. Based on this unified state space and a new reward function, we train and deploy a single generally applicable planning policy on previously unmet variations of terrain monitoring missions using reinforcement learning (RL).
Our approach is conceptually depicted in the figure below. We unify the adaptive IPP problem formulation across different map representations required to spatially capture various continuous- and discrete-valued terrain features. To this end, we view any terrain monitoring mission as a binary classification task, probabilistically splitting the terrain into unknown interesting areas and uninteresting areas. Based on this belief over interesting areas, we propose a map-agnostic planning state space and introduce a reward function to online-solve or offline-train a planning policy across terrain monitoring missions with different map representations. Last, we show how we use our state formulation and reward function to offline-train adaptive IPP policies on varying terrain monitoring missions in simulation using RL.
Our unified belief over interesting areas for continuous- (left) and discrete-valued (right) terrain features. Grey areas are unknown with large map uncertainty. (Left) Posterior normal distributions inferred from a Gaussian process or Kalman filter map representation with an interesting value threshold of 0.6. The unified belief is computed by the orange area under the curve, which is larger for known interesting areas than for unknown uncertain areas. (Right) The unified belief is given by the sum of posterior probability masses over interesting classes (orange) extracted from an occupancy map representation.
Clone repository first:
git clone git@github.com:dmar-bonn/ipp-rl-gen.git
cd ipp-rl-gen
Install anaconda on your host system.
Create a new conda environment:
conda create -n ipp-rl-gen python=3.8
Activate conda environment:
conda activate ipp-rl-gen
Install required dependencies:
pip3 install -r requirements.txt
pip3 install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv torch_geometric -f https://data.pyg.org/whl/torch-2.2.0+cu121.html
Build docker image:
docker build -t ipp_rl_gen .
You may need to install the NVIDIA container toolkit to bring the GPUs to work with Docker.
Run scripts or bash commands inside docker container:
docker run --rm --gpus all -v $(pwd):/ipp_rl_gen -it ipp_rl_gen bash -c "<YOUR_BASH_COMMAND>"
To train an RL agent with a pre-specified yaml
configuration, run:
python3 train.py -c <PATH_TO_CONFIG_YAML>
or using the Docker setup, run:
docker run --rm --gpus all -v $(pwd):/ipp_rl_gen -it ipp_rl_gen bash -c "python3 train.py -c <PATH_TO_CONFIG_YAML>"
You can use config/config.yml
as an example configuration.
To evaluate your trained RL agents or non-learning-based baseline planning algorithms for a certain number of episodes, use the evaluation script loading a previously saved checkpoint of the agent's policy:
python3 eval.py --config <PATH_TO_CONFIG_YAML> --checkpoint <PATH_TO_SB3_CHECKPOINT> --num_episodes <NUM_EVAL_EPISODES> --log_path <PATH_TO_LOG_DIR>
The results are saved as a .npz
file to the disk in the PATH_TO_LOG_DIR folder.
The framework is built in a modular way consisting of three main modules. First, the simulator (green) generates randomised continuous- or discrete-valued terrain fields and simulates fixed- or varying-altitude UGV- or UAV-like movements and sensor measurements. Second, the state representation (purple) builds on top of provided map representations, such as Gaussian processes and occupancy mapping, to define the policy's state space following image-like or graph-like representations. Based on the state representation and configurable or customisable reward functions (solid red), a user-configurable or newly integrated RL algorithm (blue) could be used to train CNN-based or GNN-based policy networks (transparent red). Non-learning-based IPP methods directly substitute the RL algorithm, and instead of policy network inference, they perform planning based on the state representation and reward function at deployment.
This work has been funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy, EXC-2070 -- 390732324 (PhenoRob), DGA project T45_23R, MCIN/AEI/ERDF/European Union NextGenerationEU/PRTR project PID2021-125514NB-I00, ONR grant N62909-24-1-2081 and grant FPU20-06563.