This repository studies LLMs’ ability to follow many fine-grained constraints. Unlike typical instruction-following tasks, we focus on scenarios where the model must identify relevant rules from long lists of guidelines (e.g., travel advice or code documentation) and generate valid outputs.
We introduce:
-
Large-Scale Constraint Generation: A framework to test whether LLMs can pick relevant constraints and apply them correctly.
-
Words Checker: A practical task where the model classifies sentences as valid or invalid based on increasingly large lists of forbidden words.
-
FoCusNet: A small model (~300k parameters) that pre-selects relevant constraints to help the LLM focus, improving accuracy.
We evaluate multiple LLM families, sizes, and prompting strategies, showing that combining FoCusNet with an LLM outperforms standard approaches as the number of constraints grows.
Code and datasets are provided to reproduce all experiments.
Figure 1: In LSCG, the model must generate a valid answer while adhering to an input task and a long
list of constraints. In the example, this can be done either by (a) directly interpreting the concatenated
task and constraints or (b) using a FoCusNet to extract relevant constraints. The first approach may
lead to inappropriate responses (e.g., offering beer to a Muslim Naous et al. (2024)), while the second
ensures valid answers.
First, install the project environment via:
conda env create -f environment.yml
Then, with the base
environment on (run conda deactivate
otherwise), run:
chmod +x set_PYTHONPATH.sh
./set_PYTHONPATH.sh
This script will make sure that your python interpreter can succesfully decode the structure of the project and succesfully import the python modules.
Eventually, run:
conda activate rule_constrainer
To activate the conda environment.
The simplest way to obtain the same results of the paper is to run all the subfolders of `experiments' in the given order (all subfolders have a 2-digit identifier). Particularly:
- Start by obtaining the dataset for WordChecker `experiments/01_data_preparation'
- Train and evaluate FoCusNet with `experiments/02_model_tuning'
- (Optional) Test the performance of FoCusNet with `experiments/03_model_testing'
- Eventually, play with different LLMs and prompting strategies at `experiments/04_LLM_baselines'