Deductive LLM Reasoning Models

This repository is a playground to explore different approaches for deductive reasoning with LLMs. The project focuses on exploring the creation, improvment and reuse of logical formalisations as part of autoformalisation.

The methodology, data and experimental results are described in our [paper][https://arxiv.org/abs/2502.04352] (accepted at ECAI 2025).

@misc{hoppe_investigating_2025,
	title = {Investigating the Robustness of Deductive Reasoning with Large Language Models},
	url = {http://arxiv.org/abs/2502.04352},
	doi = {10.48550/arXiv.2502.04352},
	author = {Hoppe, Fabian and Ilievski, Filip and Kalo, Jan-Christoph},
}

At the moment the documentation of this repository is limited, but it might be extended in the future. The repository provides anything you need to reproduce the performed experiments and hopefully extend upon our work.

Feel free to contact me, in case you identify bugs or other challenges while using this simple project.

The following sections are work in progress in order to provide a bit more documentation for the repository itself.

Basic structure

The framework relies on Hydra for configuration managment, MLFlow for experiment tracking and parts of the autoformalisation pipline are implemented using LangGraph.

src/langdeductive: Python package to group all deductive reasoning models.
configs: YAML configuration files of the framework
tests: Stub for unit-tests
data: Folder for dataset (I use softlinks )
outputs: Default folder for results

Experiments:

evaluate.py: Running evaluation on a set of datasets.
summarize.py: Summarize previous evaluation results from MLflow tracking server

Setup

The project contains a pyproject.toml file with all dependencies (setting up the project should be possible with uv sync )

Running the experiments requires several enviroment variables, that should be specified in an .env file.

PROJECT_ROOT="<absolut path to project dir>"
OPENAI_API_KEY=""
HUGGINGFACE_KEY=""

MLFLOW_TRACKING_USERNAME=""
MLFLOW_TRACKING_PASSWORD=""

Configuration managment

Hydra

Experiment tracking

MLFlow

Experiments

python -m evaluate --config-name eval

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
src/langdeductive		src/langdeductive
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
evaluate.py		evaluate.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deductive LLM Reasoning Models

Basic structure

Setup

Configuration managment

Experiment tracking

Experiments

About

Uh oh!

Releases

Packages

Languages

License

Fab-Hop/langdeductive

Folders and files

Latest commit

History

Repository files navigation

Deductive LLM Reasoning Models

Basic structure

Setup

Configuration managment

Experiment tracking

Experiments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages