From ba1ccb3a4a8dbef7fc17c8fdec6c9f78a4ab137d Mon Sep 17 00:00:00 2001 From: Simon Willison Date: Mon, 28 Oct 2024 15:46:52 -0700 Subject: [PATCH] Release 0.17a0 Refs #587, #590 --- docs/changelog.md | 34 ++++++++++++++++++++++++++++++++++ docs/python-api.md | 3 +++ docs/usage.md | 3 ++- setup.py | 2 +- 4 files changed, 40 insertions(+), 2 deletions(-) diff --git a/docs/changelog.md b/docs/changelog.md index 161317cb..970b2e9b 100644 --- a/docs/changelog.md +++ b/docs/changelog.md @@ -1,5 +1,39 @@ # Changelog +(v0_17a0)= +## 0.17a0 (2024-10-28) + +Alpha support for **attachments**, allowing multi-modal models to accept images, audio, video and other formats. [#578](https://github.com/simonw/llm/issues/578) + +Attachments {ref}`in the CLI ` can be URLs: + +```bash +llm "describe this image" \ + -a https://static.simonwillison.net/static/2024/pelicans.jpg +``` +Or file paths: +```bash +llm "extract text" -a image1.jpg -a image2.jpg +``` +Or binary data, which may need to use `--attachment-type` to specify the MIME type: +```bash +cat image | llm "extract text" --attachment-type - image/jpeg +``` + +Attachments are also available {ref}`in the Python API `: + +```python +model = llm.get_model("gpt-4o-mini") +response = model.prompt( + "Describe these images", + attachments=[ + llm.Attachment(path="pelican.jpg"), + llm.Attachment(url="https://static.simonwillison.net/static/2024/pelicans.jpg"), + ] +) +``` +Plugins that provide alternative models can support attachments, see {ref}`advanced-model-plugins-attachments` for details. + (v0_16)= ## 0.16 (2024-09-12) diff --git a/docs/python-api.md b/docs/python-api.md index dd553101..ae135a68 100644 --- a/docs/python-api.md +++ b/docs/python-api.md @@ -49,6 +49,9 @@ response = model.prompt( system="Answer like GlaDOS" ) ``` + +(python-api-attachments)= + ### Attachments Model that accept multi-modal input (images, audio, video etc) can be passed attachments using the `attachments=` keyword argument. This accepts a list of `llm.Attachment()` instances. diff --git a/docs/usage.md b/docs/usage.md index 6b825245..94cb5ca5 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -45,6 +45,7 @@ Some models support options. You can pass these using `-o/--option name value` - ```bash llm 'Ten names for cheesecakes' -o temperature 1.5 ``` +(usage-attachments)= ### Attachments Some models are multi-modal, which means they can accept input in more than just text. GPT-4o and GPT-4o mini can accept images, and models such as Google Gemini 1.5 can accept audio and video as well. @@ -56,7 +57,7 @@ llm "describe this image" -a https://static.simonwillison.net/static/2024/pelica ``` Attachments can be passed using URLs or file paths, and you can attach more than one attachment to a single prompt: ```bash -llm "describe these images" -a image1.jpg -a image2.jpg +llm "extract text" -a image1.jpg -a image2.jpg ``` You can also pipe an attachment to LLM by using `-` as the filename: ```bash diff --git a/setup.py b/setup.py index 4c77a9e3..44669127 100644 --- a/setup.py +++ b/setup.py @@ -1,7 +1,7 @@ from setuptools import setup, find_packages import os -VERSION = "0.16" +VERSION = "0.17a0" def get_long_description():