Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log input tokens, output tokens and token details #642

Merged
merged 9 commits into from
Nov 20, 2024
Merged

Log input tokens, output tokens and token details #642

merged 9 commits into from
Nov 20, 2024

Conversation

simonw
Copy link
Owner

@simonw simonw commented Nov 20, 2024

Refs:

TODO:

  • Log input/output/details to new columns on responses table.
  • llm prompt -u/--usage option
  • Add token usage information to markdown llm logs output
  • Implement this in at least one other plugin to check it makes sense
  • Update plugin docs to explain response.set_usage()
  • Document how to use this in the Python API (I'll need this myself for the datasette-llm package) I need to document Response generally, will do this in a new issue.

@simonw
Copy link
Owner Author

simonw commented Nov 20, 2024

I'm going to omit the token information from llm logs markdown unless the user specifies -u/--usage (I'll keep it on the JSON by default though).

@simonw
Copy link
Owner Author

simonw commented Nov 20, 2024

End output of llm logs -u now:

...

Example Command:

If you have a SQLite database named texts.db with a table documents containing a text column content, the command would look like this:

llm embed-multi my-texts \
  --sql "SELECT id, content FROM documents" \
  --model ada-002 \
  --store

Replace ada-002 with the embedding model that you wish to use for processing the text. Adjust the SQL query to fit your actual table structure.

This will process all entries in the documents table and store the embeddings in the my-texts collection.

Token usage:

30,791 input, 30,791 output, {"prompt_tokens_details": {"cached_tokens": 30592}}

@simonw
Copy link
Owner Author

simonw commented Nov 20, 2024

This diff to llm-claude-3 logged token counts correctly:

diff --git a/llm_claude_3.py b/llm_claude_3.py
index a05b01b..281084e 100644
--- a/llm_claude_3.py
+++ b/llm_claude_3.py
@@ -240,16 +240,23 @@ class ClaudeMessages(_Shared, llm.Model):
     def execute(self, prompt, stream, response, conversation):
         client = Anthropic(api_key=self.get_key())
         kwargs = self.build_kwargs(prompt, conversation)
+        usage = None
         if stream:
             with client.messages.stream(**kwargs) as stream:
                 for text in stream.text_stream:
                     yield text
                 # This records usage and other data:
                 response.response_json = stream.get_final_message().model_dump()
+                usage = response.response_json.pop("usage")
         else:
             completion = client.messages.create(**kwargs)
             yield completion.content[0].text
             response.response_json = completion.model_dump()
+            usage = response.response_json.pop("usage")
+        if usage:
+            response.set_usage(
+                input=usage.get("input_tokens"), output=usage.get("output_tokens")
+            )
 
 
 class ClaudeMessagesLong(ClaudeMessages):

@simonw
Copy link
Owner Author

simonw commented Nov 20, 2024

Better Claude diff:

diff --git a/llm_claude_3.py b/llm_claude_3.py
index a05b01b..0a6e236 100644
--- a/llm_claude_3.py
+++ b/llm_claude_3.py
@@ -231,6 +231,13 @@ class _Shared:
             kwargs["extra_headers"] = self.extra_headers
         return kwargs
 
+    def set_usage(self, response):
+        usage = response.response_json.pop("usage")
+        if usage:
+            response.set_usage(
+                input=usage.get("input_tokens"), output=usage.get("output_tokens")
+            )
+
     def __str__(self):
         return "Anthropic Messages: {}".format(self.model_id)
 
@@ -250,6 +257,7 @@ class ClaudeMessages(_Shared, llm.Model):
             completion = client.messages.create(**kwargs)
             yield completion.content[0].text
             response.response_json = completion.model_dump()
+        self.set_usage(response)
 
 
 class ClaudeMessagesLong(ClaudeMessages):
@@ -270,6 +278,7 @@ class AsyncClaudeMessages(_Shared, llm.AsyncModel):
             completion = await client.messages.create(**kwargs)
             yield completion.content[0].text
             response.response_json = completion.model_dump()
+        self.set_usage(response)
 
 
 class AsyncClaudeMessagesLong(AsyncClaudeMessages):

@simonw simonw marked this pull request as ready for review November 20, 2024 04:15
@simonw simonw merged commit cfb10f4 into main Nov 20, 2024
61 checks passed
@simonw simonw deleted the usage branch November 20, 2024 04:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant