This the repo with the planning related experiments presented in the paper. For experiments related to other benchmarks like Last Letter Concatenation, check out this repo
- Linux
- Python 3.6+
- Install required packages with
pip install -r requirements.txt
- Install required packages with
- Fast Downward
- Use the version in planner_tools or download from here
- Assign path of the folder to the environment variable FAST_DOWNWARD
FAST_DOWNWARD=/path/to/fast_downward
- VAL
- Use the version in planner_tools or download from here
- Assign path of the folder to the environment variable VAL
VAL=/path/to/val
- PR2Plan
- Use the version in planner_tools or download and compile obs-compiler from here
- Assign path of the folder to the environment variable PR2
PR2=/path/to/pr2plan
- LLM access/setup - (currently OpenAI/BLOOM)
Library requirements are provided in requirements.txt
python3 prompt_generation.py -t TASK -c CONFIG [-ct COT_TYPE] [-si SPECIFIC-INSTANCES] [-re RANDOM-EXAMPLE] [-v VERBOSE] [-s SEED] [-ie] [-br BLOCKS_RANGE_START BLOCKS_RANGE_END] [-oe OVERRIDE_EXAMPLE]
- --task: The task to run: "standard" or "cot"
- --config: The name of the config file to use. The config file must be a YAML file present in the configs folder. These configs decide the test problem distribution.
-
-ie: If added as part of the command, the pipeline will ignore the already completed instances and rerun the entire pipeline. If not added, the pipeline will not redo already completed instances. Default is False.
-
-si: If a list of instance ids is provided, the pipeline will only run the task on those instances. If not provided, the pipeline will run the task on all instances between the start and end provided in the config file. Default is None. For example, -si 1 2 3 4 5
-
-re: If set to True, the example instance for each task will be randomly chosen from the set of instances. If set to False, the previous instance id will be used for the example prompt. Default is False.
-
-v: If set to True, the pipeline will print the prompts, responses and evaluation. Default is False.
-
-s: The seed to use for randomization. Default is 42.
-
-ct: The type of chain of thought. Provide this if task is "cot". For now it's only "upb" which means Universal Plan Breakdown. Default is "none"
-
-br: Range of blocks for Blocksworld. Default is 3 to 20.
-
-oe: Overriding current examples with examples from a different problem distribution. "st" - Progression Proof, "ds" - Domain Specific (Stacking), "lex" - Lexicographic Stacking
This will generate the prompts for the given task and store them in the prompts folder as json files.
python3 response_generation.py -t TASK -c CONFIG --engine ENGINE [-ct COT_TYPE] [-temp TEMPERATURE] [-si SPECIFIC-INSTANCES] [-re RANDOM-EXAMPLE] [-v VERBOSE] [-s SEED] [-ie] [-oe OVERRIDE_EXAMPLE]
This will generate the responses for the given task using the generated prompts. The generated responses are appended to the prompt jsons and are stored in the responses folder.
python3 response_evaluation.py -t TASK -c CONFIG --engine ENGINE [-ct COT_TYPE] [-temp TEMPERATURE] [-si SPECIFIC-INSTANCES] [-re RANDOM-EXAMPLE] [-v VERBOSE] [-s SEED] [-ie] [-oe OVERRIDE_EXAMPLE]
This will evaluate the raw responses generated by the model. The evaluation is appended to the response jsons and the final results are stored in the results folder.
@inproceedings{
stechly2024chain,
title={Chain of Thoughtlessness? An Analysis of CoT in Planning},
author={Kaya Stechly and Karthik Valmeekam and Subbarao Kambhampati},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=kPBEAZU5Nm}
}