LeanTool is a simple utility that connects LLMs with a "Code Interpreter" for Lean. This is implemented as tool calls from the LLM to the Lean executable, hence the name.
Current LLMs often have trouble with outputing code with correct Lean 4 syntax, due to the recent rapid changes in the Lean language and its libraries. By allowing LLMs to talk directly to Lean, they are given opportunities to fix their mistakes. Furthermore, Lean being an interactive theorem prover, there are many interactive features that are not well represented in training data. This utility includes some initial efforts on prompting the LLMs with instructions on using these interactive features to better produce proofs.
Our design goal is to be flexible: easy to plug into automated workflows, but can also be plugged into human-facing chat/copilot interfaces.
This is part of a broader effort to create safe and hallucination-free coding AIs.
- Uses LiteLLM so you can plug in any compatible LLM, from OpenAI and Anthropic APIs to local LLMs hosted via ollama or vLLM.
- Feedback loop that allows the LLM to fix its errors.
- Uses Pantograph to extract goal states from
sorry
s. This facilitates breaking complex proof tasks down into simpler subtasks, e.g. draft-sketch-prove. - System prompt instructions to utilize Lean features that are likely missing from the LLMs' training data, including interactive commands that elicit suggestions / information from Lean
- Option to pass code in plain text mode instead of as tool calls formatted in JSON. This allows LeanTool to be used by models that do not yet support tool/function calls, including some reasoning models like Deepseek r1 and Gemini-2-flash-thinking.
- Plugin system to allow optional features to be included at run time.
- Flexible usage: as python library, as command-line chat interface, as OpenAI-compatible API server, or as Model Context Protocol (MCP) server. Supports a wide range of coding assistants that can utilize custom OpenAI-compatible APIs and/or MCP servers, including Cursor, Aider, Cline, and Claude Code.
A demo of the OpenAI-compatible API server is up at http://www.codeproofarena.com:8800/v1. To use it, connect your app to the above URL as the API Base URL, "provider" as OpenAI or OpenAI-compatible, "model" as one of the key names in the models dict in leantool.py, and API key as your API key for the chosen model. See below for specific set up details for OpenWebUI, Continue.dev, Cline and Aider.
- Install Lean
- Install
poetry
- Clone the repository
- Install Pantograph by following its instructions. Create the wheel file.
- Modify
pyproject.toml
in the LeanTool directory, to ensure thepantograph
entry points to the correct path and file name to the.whl
file. poetry install
- Install Mathlib as needed
- Set up your LLM model to connect via
LiteLLM
. E.g. for OpenAI, just set the environmental variableOPENAI_API_KEY
. For Anthropic,ANTHROPIC_API_KEY
. If you want to try many different models, sign up for an OpenRouter API key and setOPENROUTER_API_KEY
. For local models served by ollama, start by installing ollama. See Relevant LiteLLM Docs for more detailed instructions. Themodels
dict inleantool.py
has some preset models; it has the format "short name" : "LiteLLM model name". Modify it to have an entry for your model if you have something different.
leantool.py
is the Python library. Simply import the file and callinteractive_lean_check
to invoke the feedback loop. Currently used by FormalizeWithTest autoformalization project, and WakingUp experiments on hallucination detection.- Plugins are optional features that users can choose to include at run time. There are a few built-in ones, but you can also implement your own and pass to the
interactive_lean_check
call. Here is a tentative interface design. A plugin is a python object that has the following members:sys_msg
: a string that will be attached to the system messageasync def process(self, code, result)
: a method that will be executed after the main Lean executable finishes. Takes in the LLM submitted code, and result a dict that records the results of the processing so far. The method should return the new result dict.
cli_chat.py
command line chat interface. Simply runpoetry run python cli_chat.py
.app.py
Streamlit chat interface.
lean-api-server-flask.py
OpenAI API compatible proxy server. Can be plugged into any application that takes a OpenAI API model with custom base URL. Can either use the API keys set in the environment variables, or take an API key token in the request, which is then passed to the corresponding LLM. Has been tested to work with OpenWebUI, a fully featured chat interface, and coding assistants Continue, Cline, and Aider.
- After the Installation steps above, the following command will launch the API server at
http://localhost:8000/v1
:
poetry run python lean-api-server-flask.py
- Install OpenWebUI. If you go with the docker option, you will need to install docker first.
Since our proxy server exposes an OpenAI compatible API, you can use
the docker command for installing OpenWebUI with OpenAI API
adding the command line option
--add-host host.docker.internal:host-gateway -e OPENAI_API_BASE_URL=http://host.docker.internal:8000/v1
- Access OpenWebUI at http://localhost:3000/.
- After the API server is running, install Continue as a VS Code extension.
- Follow the instructions here
to set up an OpenAI-compatible model by specifying an
apiBase
url. Set the model name to be the key name of your chosen model in the models dict inleantool.py
, e.g. "sonnet". For theapiKey
field you may provide your API key for the chosen model.
- Install the Cline VS Code extension.
- Set the model type to be OpenAI-compatible, and provide the base url. Set the model name for your chosen model, e.g. "sonnet", and your API key.
- After the API server is running, install Aider.
- Connect to the server's models as OpenAI-compatible API models: see instructions. E.g. for Mac/Linux:
export OPENAI_API_BASE=<endpoint for the API server>
export OPENAI_API_KEY=<key for your chosen model>
# Prefix the model name with openai/
aider --model openai/sonnet
leanmcp.py
is a Model Context Protocol (MCP) server. This exports Lean (and plugins including load_sorry) as a MCP tool, without the LLM and the feedback loop. Works with apps that can utilize MCP servers, and are able to manage the feedback loop within the app. Has been tested to work with Claude Code, Cursor and Goose.- Can be run in
stdio
mode: e.g. when configuring your app for MCP, fill in the commandpoetry run python leanmcp.py
- Can also serve over the network in
sse
mode: e.g. runpoetry run python leanmcp.py --sse --port 8008
, then fill in the URLhttp://<your-host-or-ip-address>:8008/sse
in your app's configuration.
- Install Claude Code
- Add leanmcp.py as an mcp server. e.g. in the LeanTool repo directory:
claude mcp add LeanTool poetry run python leanmcp.py
poetry shell
claude
- Set up the MCP server in Cursor, see instructions. E.g. for
sse
mode, run the server withpoetry run python leanmcp.py --sse --port 8008
, then in Cursor, go toCursor Settings > Features > MCP
, click on the+ Add New MCP Server
button, and fill in the URLhttp://<your-host-or-ip-address>:8008/sse
. - Test the set up. You may want to explicitly ask the LLM to use the tool in your prompt. If needed, add additional instructions in the Rules for AI setting.
- Example test prompt: "State a theorem in Lean 4 that n*(n+1) is even, for all natural numbers n. Write
sorry
in place of the proof. Pass the code to the provided tool to check for syntax, and show me its output". Cursor will show the MCP tool call; you may need to click theRun tool
button to approve the call.