-
Couldn't load subscription status.
- Fork 1.2k
feat(responses)!: add Prompts API to Responses API #3514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(responses)!: add Prompts API to Responses API #3514
Conversation
ef753bc to
fe6ea4c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an api change unrelated to how prompts are used in /v1/responses
please review your code assistant output before posting as a PR.
Hi @mattf ! Could you please elaborate on how the prompts should be used in Responses API in your opinion. My understanding was that they should be propagated to Agent’s messages context as OpenAISystemMessageParam |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @r3v5 it looks like you've suggested adding prompt_id here where you need to add a Prompt object with an id, version, and variables, which would then be consistent with OpenAI's client usage, as outlined here:
response = client.responses.create(
prompt={
"id": "pmpt_68b0c29740048196bd3a6e6ac3c4d0e20ed9a13f0d15bf5e",
"version": "2",
"variables": {
"city": "San Francisco",
"age": 30,
}
}
)So this is currently incorrect. As @mattf suggested, let's make sure we double check this. Thank you.
Oh yeah, this makes sense. I got it. I will adjust the implementation then |
fe6ea4c to
a3cdf78
Compare
| """ | ||
|
|
||
| id: str | ||
| version: str | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Version has type string because OpenAI has it like string. Reference is here
|
@cdoern this is an enhancement to the /openai/v1/responses api, does it match the openai /v1/responses api spec? |
there seems to be some breaking changes, BUT these might have existed in main, let me check. |
a3cdf78 to
d76b15b
Compare
|
hey @cdoern any update on the main branch check here? |
fadf1d0 to
f474e0c
Compare
4175600 to
f37efb1
Compare
The PR description shows a working example of Prompt inside the Response create, but I'd like to see |
d63e31f to
b954305
Compare
|
@r3v5 unit tests are failing |
I just rebased commit from main today. Still haven't finished my implementation |
1660935 to
7a7b2b7
Compare
|
Hey @leseb , @franciscojavierarceo ! Here I provide a comprehensive testing of support Prompts in Responses API via curl requests to LLS server. Test Prompts with Images with text on them in Responses API: I used this image for testing purposes: iphone 17 image
Output after inferencing:
The same example but without providing the description of product: Output:
Test Prompts with PDF files in Responses API: I used this PDF file for testing purposes: invoicesample.pdf
Output after inferencing:
Test simple text Prompt in Responses API:
Output after inferencing:
|
The implementation is there :) |
|
sorry @r3v5 this keeps getting wrecked 😭 |
|
last rebase and i think we're good to go |
|
We haven't landed prompts API implementation have we? |
No worries, @franciscojavierarceo ! I will do rebase today. |
7a7b2b7 to
59169bf
Compare
I rebased from main, CI is green! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sure to expand the title & description of this pr to match the expanded scope
also, make sure there is test coverage for new apis as they're used outside of prompts
I have updated PR description now. |
| output: list[OpenAIResponseOutput] | ||
| parallel_tool_calls: bool = False | ||
| previous_response_id: str | None = None | ||
| prompt: Prompt | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but then is this object the full one?! and not the same object you created above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When instance of OpenAIResponseObject class is created, it correctly contains link to object of Prompt class. If user doesn't provide prompt during creation of response, then there is no link to any prompt.
The prompt params we use during creation of response, refer to OpenAIResponsePromptParam class that is the standing to handle different types of prompt's variables.
@json_schema_type
class OpenAIResponsePromptParam(BaseModel):
"""Prompt object that is used for OpenAI responses.
:param id: Unique identifier of the prompt template
:param variables: Dictionary of variable names to OpenAIResponseInputMessageContent structure for template substitution
:param version: Version number of the prompt to use (defaults to latest if not specified)
"""
id: str
variables: dict[str, OpenAIResponseInputMessageContent] | None = None
version: str | None = None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prompt class sits in Prompts API while OpenAIResponsePromptParam helper structure is defined in Agents API in apis/agents/openai_responses.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@r3v5 now I am not so sure. OpenAI's "Response object" doc (on their reference) says the prompt field within contains exactly three fields { id, variables, version }. On the other hand the Prompt field in our incantation of the prompts API (which is NOT part of the OpenAI API set) has the fields { prompt_id, version, variables, is_default, prompt }
This is a discrepancy -- at least the { id } field being a clear discrepancy but even referencing that other object we made up seems wrong. This is what @leseb brought up before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. What do you think we can settle down with in terms of classes? Because as you said, now we have two classes for prompts on different API layers.
| backend="sql_default", | ||
| table_name="openai_conversations", | ||
| ).model_dump(exclude_none=True), | ||
| "prompts": SqlStoreReference( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change should be a separate PR completely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should just separate this PR into three PRs
- first is the change I point out here
- second is the change to the API only -- no implementation at all
- implementation and tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I see. I will do it. Should I create issues for each PR or just submit PRs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@r3v5 I think PRs all linking to the same issue is fine.
|
I submitted the first PR of three |
# What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR is responsible for attaching prompts to storage stores in run configs. It allows to specify prompts as stores in different distributions. The need of this functionality was initiated in #3514 > Note, #3514 is divided on three separate PRs. Current PR is the first of three. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> Manual testing and updated CI unit tests Prerequisites: 1. `uv run --with llama-stack llama stack list-deps starter | xargs -L1 uv pip install` 2. `llama stack run starter ` ``` INFO 2025-10-23 15:36:17,387 llama_stack.cli.stack.run:100 cli: Using run configuration: /Users/ianmiller/llama-stack/llama_stack/distributions/starter/run.yaml INFO 2025-10-23 15:36:17,423 llama_stack.cli.stack.run:157 cli: HTTPS enabled with certificates: Key: None Cert: None INFO 2025-10-23 15:36:17,424 llama_stack.cli.stack.run:159 cli: Listening on ['::', '0.0.0.0']:8321 INFO 2025-10-23 15:36:17,749 llama_stack.core.server.server:521 core::server: Run configuration: INFO 2025-10-23 15:36:17,756 llama_stack.core.server.server:524 core::server: apis: - agents - batches - datasetio - eval - files - inference - post_training - safety - scoring - tool_runtime - vector_io image_name: starter providers: agents: - config: persistence: agent_state: backend: kv_default namespace: agents responses: backend: sql_default max_write_queue_size: 10000 num_writers: 4 table_name: responses provider_id: meta-reference provider_type: inline::meta-reference batches: - config: kvstore: backend: kv_default namespace: batches provider_id: reference provider_type: inline::reference datasetio: - config: kvstore: backend: kv_default namespace: datasetio::huggingface provider_id: huggingface provider_type: remote::huggingface - config: kvstore: backend: kv_default namespace: datasetio::localfs provider_id: localfs provider_type: inline::localfs eval: - config: kvstore: backend: kv_default namespace: eval provider_id: meta-reference provider_type: inline::meta-reference files: - config: metadata_store: backend: sql_default table_name: files_metadata storage_dir: /Users/ianmiller/.llama/distributions/starter/files provider_id: meta-reference-files provider_type: inline::localfs inference: - config: api_key: '********' url: https://api.fireworks.ai/inference/v1 provider_id: fireworks provider_type: remote::fireworks - config: api_key: '********' url: https://api.together.xyz/v1 provider_id: together provider_type: remote::together - config: {} provider_id: bedrock provider_type: remote::bedrock - config: api_key: '********' base_url: https://api.openai.com/v1 provider_id: openai provider_type: remote::openai - config: api_key: '********' provider_id: anthropic provider_type: remote::anthropic - config: api_key: '********' provider_id: gemini provider_type: remote::gemini - config: api_key: '********' url: https://api.groq.com provider_id: groq provider_type: remote::groq - config: api_key: '********' url: https://api.sambanova.ai/v1 provider_id: sambanova provider_type: remote::sambanova - config: {} provider_id: sentence-transformers provider_type: inline::sentence-transformers post_training: - config: checkpoint_format: meta provider_id: torchtune-cpu provider_type: inline::torchtune-cpu safety: - config: excluded_categories: [] provider_id: llama-guard provider_type: inline::llama-guard - config: {} provider_id: code-scanner provider_type: inline::code-scanner scoring: - config: {} provider_id: basic provider_type: inline::basic - config: {} provider_id: llm-as-judge provider_type: inline::llm-as-judge - config: openai_api_key: '********' provider_id: braintrust provider_type: inline::braintrust tool_runtime: - config: api_key: '********' max_results: 3 provider_id: brave-search provider_type: remote::brave-search - config: api_key: '********' max_results: 3 provider_id: tavily-search provider_type: remote::tavily-search - config: {} provider_id: rag-runtime provider_type: inline::rag-runtime - config: {} provider_id: model-context-protocol provider_type: remote::model-context-protocol vector_io: - config: persistence: backend: kv_default namespace: vector_io::faiss provider_id: faiss provider_type: inline::faiss - config: db_path: /Users/ianmiller/.llama/distributions/starter/sqlite_vec.db persistence: backend: kv_default namespace: vector_io::sqlite_vec provider_id: sqlite-vec provider_type: inline::sqlite-vec registered_resources: benchmarks: [] datasets: [] models: [] scoring_fns: [] shields: [] tool_groups: - provider_id: tavily-search toolgroup_id: builtin::websearch - provider_id: rag-runtime toolgroup_id: builtin::rag vector_stores: [] server: port: 8321 storage: backends: kv_default: db_path: /Users/ianmiller/.llama/distributions/starter/kvstore.db type: kv_sqlite sql_default: db_path: /Users/ianmiller/.llama/distributions/starter/sql_store.db type: sql_sqlite stores: conversations: backend: sql_default table_name: openai_conversations inference: backend: sql_default max_write_queue_size: 10000 num_writers: 4 table_name: inference_store metadata: backend: kv_default namespace: registry prompts: backend: kv_default namespace: prompts telemetry: enabled: true vector_stores: default_embedding_model: model_id: nomic-ai/nomic-embed-text-v1.5 provider_id: sentence-transformers default_provider_id: faiss version: 2 INFO 2025-10-23 15:36:20,032 llama_stack.providers.utils.inference.inference_store:74 inference: Write queue disabled for SQLite to avoid concurrency issues WARNING 2025-10-23 15:36:20,422 llama_stack.providers.inline.telemetry.meta_reference.telemetry:84 telemetry: OTEL_EXPORTER_OTLP_ENDPOINT is not set, skipping telemetry INFO 2025-10-23 15:36:22,379 llama_stack.providers.utils.inference.openai_mixin:436 providers::utils: OpenAIInferenceAdapter.list_provider_model_ids() returned 105 models INFO 2025-10-23 15:36:22,703 uvicorn.error:84 uncategorized: Started server process [17328] INFO 2025-10-23 15:36:22,704 uvicorn.error:48 uncategorized: Waiting for application startup. INFO 2025-10-23 15:36:22,706 llama_stack.core.server.server:179 core::server: Starting up Llama Stack server (version: 0.3.0) INFO 2025-10-23 15:36:22,707 llama_stack.core.stack:470 core: starting registry refresh task INFO 2025-10-23 15:36:22,708 uvicorn.error:62 uncategorized: Application startup complete. INFO 2025-10-23 15:36:22,708 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) ``` As you can see, prompts are attached to stores in config Testing: 1. Create prompt: ``` curl -X POST http://localhost:8321/v1/prompts \ -H "Content-Type: application/json" \ -d '{ "prompt": "Hello {{name}}! You are working at {{company}}. Your role is {{role}} at {{company}}. Remember, {{name}}, to be {{tone}}.", "variables": ["name", "company", "role", "tone"] }' ``` `{"prompt":"Hello {{name}}! You are working at {{company}}. Your role is {{role}} at {{company}}. Remember, {{name}}, to be {{tone}}.","version":1,"prompt_id":"pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e163f","variables":["name","company","role","tone"],"is_default":false}% ` 2. Get prompt: `curl -X GET http://localhost:8321/v1/prompts/pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e163f` `{"prompt":"Hello {{name}}! You are working at {{company}}. Your role is {{role}} at {{company}}. Remember, {{name}}, to be {{tone}}.","version":1,"prompt_id":"pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e163f","variables":["name","company","role","tone"],"is_default":false}% ` 3. Query sqlite KV storage to check created prompt: ``` sqlite> .mode column sqlite> .headers on sqlite> SELECT * FROM kvstore WHERE key LIKE 'prompts:v1:%'; key value expiration ------------------------------------------------------------ ------------------------------------------------------------ ---------- prompts:v1:pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e {"prompt_id": "pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab 163f:1 5f6e163f", "prompt": "Hello {{name}}! You are working at {{c ompany}}. Your role is {{role}} at {{company}}. Remember, {{ name}}, to be {{tone}}.", "version": 1, "variables": ["name" , "company", "role", "tone"], "is_default": false} prompts:v1:pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e 1 163f:default sqlite> ```
What does this PR do?
The purpose of this PR is to integrate Prompts API to Responses API to achieve full OpenAI compatibility for current Responses API in Llama Stack.
OpenAIResponseInputMessageContentTextobjectOpenAIResponseInputMessageContentImageobjectOpenAIResponseInputMessageContentFileobjectThis is done to match OpenAI API specs. Reference can be found here
Closes #3321
Test Plan
Manual API testing and running newly added unit tests.
Prerequisites:
uv run --with llama-stack llama stack build --distro starter --image-type venv --runThe comprehensive testing can be found here