ANQ: Adaptive Neighborhood-constrained Q Learning

Code for NeurIPS 2025 accepted paper: Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning.

📝 Introduction

Offline reinforcement learning (RL) suffers from extrapolation errors induced by out-of-distribution (OOD) actions. To address this, offline RL algorithms typically impose constraints on action selection, which can be systematically categorized into density, support, and sample constraints. However, we show that each category has inherent limitations: density and sample constraints tend to be overly conservative in many scenarios, while the support constraint, though least restrictive, faces challenges in accurately modeling the behavior policy. To overcome these limitations, we propose a new neighborhood constraint that restricts action selection in the Bellman target to the union of neighborhoods of dataset actions. Theoretically, the constraint not only bounds extrapolation errors and distribution shift under certain conditions, but also approximates the support constraint without requiring behavior policy modeling. Moreover, it retains substantial flexibility and enables pointwise conservatism by adapting the neighborhood radius for each data point. In practice, we employ data quality as the adaptation criterion and design an adaptive neighborhood constraint. Building on an efficient bilevel optimization framework, we develop a simple yet effective algorithm, Adaptive Neighborhood-constrained Q learning (ANQ), to perform Q learning with target actions satisfying this constraint. Empirically, ANQ achieves state-of-the-art performance on standard offline RL benchmarks and exhibits strong robustness in scenarios with noisy or limited data.

🔧 Environment

The experiments in our paper were conducted using the following configuration:

🚀 Usage

Training

Run the following script for offline RL training with ANQ:

bash ./run_experiments.sh

This script will launch a series of experiments on the D4RL benchmark environments. To tailor the experiment setup—such as specifying the environment, setting the random seed, or adjusting training hyperparameters—you can either modify the script or pass parameters directly via the command line.

Example:

python main.py --env halfcheetah-medium-v2 --alpha 1 --lam 0.1 --seed 0

Logging

This codebase uses TensorBoard for logging metrics such as episode returns, Q values, and training losses.

To visualize training logs:

tensorboard --logdir <run_dir>

Logs are saved in runs/ by default.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
figures		figures
.gitignore		.gitignore
ANQ.py		ANQ.py
README.md		README.md
main.py		main.py
run_experiments.sh		run_experiments.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ANQ: Adaptive Neighborhood-constrained Q Learning

📝 Introduction

🔧 Environment

🚀 Usage

Training

Logging

About

Uh oh!

Releases

Packages

Languages

thu-rllab/ANQ

Folders and files

Latest commit

History

Repository files navigation

ANQ: Adaptive Neighborhood-constrained Q Learning

📝 Introduction

🔧 Environment

🚀 Usage

Training

Logging

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages