Skip to content

GasStationManager/LeanTool

Repository files navigation

LeanTool: Helping LLMs Be Better at Lean

LeanTool is a simple utility that connects LLMs with a "Code Interpreter" for Lean. This is implemented as tool calls from the LLM to the Lean executable, hence the name.

Current LLMs often have trouble with outputing code with correct Lean 4 syntax, due to the recent rapid changes in the Lean language and its libraries. By allowing LLMs to talk directly to Lean, they are given opportunities to fix their mistakes. Furthermore, Lean being an interactive theorem prover, there are many interactive features that are not well represented in training data. This utility includes some initial efforts on prompting the LLMs with instructions on using these interactive features to better produce proofs.

Our design goal is to be flexible: easy to plug into automated workflows, but can also be plugged into human-facing chat/copilot interfaces.

This is part of a broader effort to create safe and hallucination-free coding AIs.

Overview of Features

  • Uses LiteLLM so you can plug in any compatible LLM, from OpenAI and Anthropic APIs to local LLMs hosted via ollama or vLLM.
  • Feedback loop that allows the LLM to fix its errors.
  • Uses Pantograph to extract goal states from sorrys. This facilitates breaking complex proof tasks down into simpler subtasks, e.g. draft-sketch-prove.
  • System prompt instructions to utilize Lean features that are likely missing from the LLMs' training data, including interactive commands that elicit suggestions / information from Lean
  • Option to pass code in plain text mode instead of as tool calls formatted in JSON. This allows LeanTool to be used by models that do not yet support tool/function calls, including some reasoning models like Deepseek r1 and Gemini-2-flash-thinking.
  • Plugin system to allow optional features to be included at run time.
  • Flexible usage: as python library, as command-line chat interface, as OpenAI-compatible API server, or as Model Context Protocol (MCP) server. Supports a wide range of coding assistants that can utilize custom OpenAI-compatible APIs and/or MCP servers, including Cursor, Aider, Cline, and Claude Code.

API Server Demo

A demo of the OpenAI-compatible API server is up at http://www.codeproofarena.com:8800/v1. To use it, connect your app to the above URL as the API Base URL, "provider" as OpenAI or OpenAI-compatible, "model" as one of the key names in the models dict in leantool.py, and API key as your API key for the chosen model. See below for specific set up details for OpenWebUI, Continue.dev, Cline and Aider.

Installation

  • Install Lean
  • Install poetry
  • Clone the repository
  • Install Pantograph by following its instructions. Create the wheel file.
  • Modify pyproject.toml in the LeanTool directory, to ensure the pantograph entry points to the correct path and file name to the .whl file.
  • poetry install
  • Install Mathlib as needed
  • Set up your LLM model to connect via LiteLLM. E.g. for OpenAI, just set the environmental variable OPENAI_API_KEY. For Anthropic, ANTHROPIC_API_KEY. If you want to try many different models, sign up for an OpenRouter API key and set OPENROUTER_API_KEY. For local models served by ollama, start by installing ollama. See Relevant LiteLLM Docs for more detailed instructions. The models dict in leantool.py has some preset models; it has the format "short name" : "LiteLLM model name". Modify it to have an entry for your model if you have something different.

Python Library and Built-in Interfaces

  • leantool.py is the Python library. Simply import the file and call interactive_lean_check to invoke the feedback loop. Currently used by FormalizeWithTest autoformalization project, and WakingUp experiments on hallucination detection.
  • Plugins are optional features that users can choose to include at run time. There are a few built-in ones, but you can also implement your own and pass to the interactive_lean_check call. Here is a tentative interface design. A plugin is a python object that has the following members:
    • sys_msg: a string that will be attached to the system message
    • async def process(self, code, result): a method that will be executed after the main Lean executable finishes. Takes in the LLM submitted code, and result a dict that records the results of the processing so far. The method should return the new result dict.
  • cli_chat.py command line chat interface. Simply run poetry run python cli_chat.py.
  • app.py Streamlit chat interface.

OpenAI-compatible Proxy Server

  • lean-api-server-flask.py OpenAI API compatible proxy server. Can be plugged into any application that takes a OpenAI API model with custom base URL. Can either use the API keys set in the environment variables, or take an API key token in the request, which is then passed to the corresponding LLM. Has been tested to work with OpenWebUI, a fully featured chat interface, and coding assistants Continue, Cline, and Aider.

Example Set Up with OpenWebUI

  • After the Installation steps above, the following command will launch the API server at http://localhost:8000/v1:
poetry run python lean-api-server-flask.py

Example Set Up with Continue.dev

  • After the API server is running, install Continue as a VS Code extension.
  • Follow the instructions here to set up an OpenAI-compatible model by specifying an apiBase url. Set the model name to be the key name of your chosen model in the models dict in leantool.py, e.g. "sonnet". For the apiKey field you may provide your API key for the chosen model.

Example Set Up with Cline

  • Install the Cline VS Code extension.
  • Set the model type to be OpenAI-compatible, and provide the base url. Set the model name for your chosen model, e.g. "sonnet", and your API key.

Example Set Up with Aider

  • After the API server is running, install Aider.
  • Connect to the server's models as OpenAI-compatible API models: see instructions. E.g. for Mac/Linux:
export OPENAI_API_BASE=<endpoint for the API server>
export OPENAI_API_KEY=<key for your chosen model>

# Prefix the model name with openai/
aider --model openai/sonnet

Model Context Protocol (MCP) Server

  • leanmcp.py is a Model Context Protocol (MCP) server. This exports Lean (and plugins including load_sorry) as a MCP tool, without the LLM and the feedback loop. Works with apps that can utilize MCP servers, and are able to manage the feedback loop within the app. Has been tested to work with Claude Code, Cursor and Goose.
  • Can be run in stdio mode: e.g. when configuring your app for MCP, fill in the command poetry run python leanmcp.py
  • Can also serve over the network in sse mode: e.g. run poetry run python leanmcp.py --sse --port 8008, then fill in the URL http://<your-host-or-ip-address>:8008/sse in your app's configuration.

Example Set Up with Claude Code

  • Install Claude Code
  • Add leanmcp.py as an mcp server. e.g. in the LeanTool repo directory:
claude mcp add LeanTool poetry run python leanmcp.py
poetry shell
claude  

Example Set Up with Cursor

  • Set up the MCP server in Cursor, see instructions. E.g. for sse mode, run the server with poetry run python leanmcp.py --sse --port 8008, then in Cursor, go to Cursor Settings > Features > MCP, click on the + Add New MCP Server button, and fill in the URL http://<your-host-or-ip-address>:8008/sse.
  • Test the set up. You may want to explicitly ask the LLM to use the tool in your prompt. If needed, add additional instructions in the Rules for AI setting.
  • Example test prompt: "State a theorem in Lean 4 that n*(n+1) is even, for all natural numbers n. Write sorry in place of the proof. Pass the code to the provided tool to check for syntax, and show me its output". Cursor will show the MCP tool call; you may need to click the Run tool button to approve the call.