Banks is the linguist professor who will help you generate meaningful
LLM prompts using a template language that makes sense. If you're still using f-strings
for the job, keep reading.
Docs are available here.
Table of Contents
pip install banks
Prompts are instrumental for the success of any LLM application, and Banks focuses around specific areas of their lifecycle:
- π Templating: Banks provides tools and functions to build prompts text and chat messages from generic blueprints.
- ποΈ Versioning and metadata: Banks supports attaching metadata to prompts to ease their management, and versioning is first-class citizen.
- ποΈ Management: Banks provides ways to store prompts on disk along with their metadata.
For a more extensive set of code examples, see the documentation page.
You'll find yourself feeding an LLM a list of chat messages instead of plain text more often than not. Banks will help you remove the boilerplate by defining the messages already at the prompt level.
from banks import Prompt
prompt_template = """
{% chat role="system" %}
You are a {{ persona }}.
{% endchat %}
{% chat role="user" %}
Hello, how are you?
{% endchat %}
"""
p = Prompt(prompt_template)
print(p.chat_messages({"persona": "helpful assistant"}))
# Output:
# [
# ChatMessage(role='system', content='You are a helpful assistant.\n'),
# ChatMessage(role='user', content='Hello, how are you?\n')
# ]
Sometimes it might be useful to ask another LLM to generate examples for you in a
few-shots prompt. Provided you have a valid OpenAI API key stored in an env var
called OPENAI_API_KEY
you can ask Banks to do something like this (note we can
annotate the prompt using comments - anything within {# ... #}
will be removed
from the final prompt):
from banks import Prompt
prompt_template = """
{% set examples %}
{% completion model="gpt-3.5-turbo-0125" %}
{% chat role="system" %}You are a helpful assistant{% endchat %}
{% chat role="user" %}Generate a bullet list of 3 tweets with a positive sentiment.{% endchat %}
{% endcompletion %}
{% endset %}
{# output the response content #}
Generate a tweet about the topic {{ topic }} with a positive sentiment.
Examples:
{{ examples }}
"""
p = Prompt(prompt_template)
print(p.text({"topic": "climate change"}))
The output would be something similar to the following:
Generate a tweet about the topic climate change with a positive sentiment.
Examples:
- "Feeling grateful for the sunshine today! π #thankful #blessed"
- "Just had a great workout and feeling so energized! πͺ #fitness #healthyliving"
- "Spent the day with loved ones and my heart is so full. π #familytime #grateful"
Important
The completion
extension uses LiteLLM under the hood, and provided you have the
proper environment variables set, you can use any model from the supported model providers.
Note
Banks uses a cache to avoid generating text again for the same template with the same context. By default the cache is in-memory but it can be customized.
Banks provides a filter tool
that can be used to convert a callable passed to a prompt into an LLM function call.
Docstrings are used to describe the tool and its arguments, and during prompt rendering Banks will perform all the LLM
roundtrips needed in case the model wants to use a tool within a {% completion %}
block. For example:
import platform
from banks import Prompt
def get_laptop_info():
"""Get information about the user laptop.
For example, it returns the operating system and version, along with hardware and network specs."""
return str(platform.uname())
p = Prompt("""
{% set response %}
{% completion model="gpt-3.5-turbo-0125" %}
{% chat role="user" %}{{ query }}{% endchat %}
{{ get_laptop_info | tool }}
{% endcompletion %}
{% endset %}
{# the variable 'response' contains the result #}
{{ response }}
""")
print(p.text({"query": "Can you guess the name of my laptop?", "get_laptop_info": get_laptop_info}))
# Output:
# Based on the information provided, the name of your laptop is likely "MacGiver."
Several inference providers support prompt caching to save time and costs, and Anthropic in particular offers fine-grained control over the parts of the prompt that we want to cache. With Banks this is as simple as using a template filter:
prompt_template = """
{% chat role="user" %}
Analyze this book:
{# Only this part of the chat message (the book content) will be cached #}
{{ book | cache_control("ephemeral") }}
What is the title of this book? Only output the title.
{% endchat %}
"""
p = Prompt(prompt_template)
print(p.chat_messages({"book":"This is a short book!"}))
# Output:
# [
# ChatMessage(role='user', content=[
# ContentBlock(type='text', text='Analyze this book:\n\n'),
# ContentBlock(type='text', cache_control=CacheControl(type='ephemeral'), text='This is a short book!'),
# ContentBlock(type='text', text='\n\nWhat is the title of this book? Only output the title.\n')
# ])
# ]
The output of p.chat_messages()
can be fed to the Anthropic client directly.
We can get the same result as the previous example loading the prompt template from a registry
instead of hardcoding it into the Python code. For convenience, Banks comes with a few registry types
you can use to store your templates. For example, the DirectoryTemplateRegistry
can load templates
from a directory in the file system. Suppose you have a folder called templates
in the current path,
and the folder contains a file called blog.jinja
. You can load the prompt template like this:
from banks import Prompt
from banks.registries import DirectoryTemplateRegistry
registry = DirectoryTemplateRegistry(populated_dir)
prompt = registry.get(name="blog")
print(prompt.text({"topic": "retrogame computing"}))
To run banks within an asyncio
loop you have to do two things:
- set the environment variable
BANKS_ASYNC_ENABLED=true
. - use the
AsyncPrompt
class that has an awaitablerun
method.
Example:
from banks import AsyncPrompt
async def main():
p = AsyncPrompt("Write a blog article about the topic {{ topic }}")
result = await p.text({"topic": "AI frameworks"})
print(result)
asyncio.run(main())
banks
is distributed under the terms of the MIT license.