-
Couldn't load subscription status.
- Fork 10
feat: new /v1/responses #161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ructure - Updated OpenAI dependency to version 1.99.2 in both `pyproject.toml` files for `nilai-api` and `nilai-common`. - Enhanced response model in `responses_model.py` by adding new fields and improving type definitions. - Refactored response handling in `responses.py` to include usage tracking for input and output tokens. - Adjusted import statements in `__init__.py` to streamline model access.
- Updated return types in `route_and_execute_tool_call` and `process_tool_calls` to use `FunctionCallOutput`. - Improved error handling and logging in tool execution. - Adjusted input handling in `handle_responses_tool_workflow` to support lists of `ResponseInputParam`. - Added new imports for `FunctionCallOutput` and related types in `nilai_common` models.
- Changed `ResponseFunctionToolCall` to `ResponseFunctionToolCallParam` in multiple functions for better type consistency. - Enhanced `handle_responses_tool_workflow` to utilize new input item types and improved handling of tool call results. - Updated imports in `__init__.py` and other files to reflect new model structures.
…e tests architecture
- Changed EC2 instance type from g4dn.xlarge to g6.xlarge in the CI workflow. - Updated the docker-compose command to use the new GPT-20B configuration file. - Added a new docker-compose file for the GPT-20B GPU service, including environment settings and health checks. - Updated the CI model reference in the test configuration to use the new GPT-20B model.
…+ ci model configuration
d06c64d to
d0c5385
Compare
- Changed environment setting from "ci" to "mainnet" in config.yaml. - Updated authentication strategy to "api_key". - Adjusted rate limiting parameters for web search functionality. - Refactored chat completion logic to improve handling of tool support and request parameters. - Added detailed docstrings to various classes and methods for better clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a new OpenAI-compatible /v1/responses API endpoint alongside architectural improvements to the codebase. The endpoint provides a more flexible interface for AI interactions with support for structured input, tool calling, web search, and streaming capabilities.
Key changes:
- Adds new
/v1/responsesendpoint with complete feature parity to chat completions - Refactors monolithic
private.pyinto modular endpoint files (chat.py,responses.py) - Introduces
responses_tool_router.pyfor responses-specific tool workflow handling - Updates
nilai-commonpackage structure fromapi_model.pytoapi_models/module with separate files for different API types
Reviewed Changes
Copilot reviewed 37 out of 39 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
tests/unit/nilai_api/test_web_search.py |
Updates import path from api_model to api_models |
tests/unit/nilai_api/routers/test_responses_private.py |
New test suite for responses endpoint covering standard requests, streaming, and tool workflows |
tests/unit/nilai_api/routers/test_nildb_endpoints.py |
Updates import paths and adds tests for responses endpoint with nilDB integration |
tests/unit/nilai_api/routers/test_chat_completions_private.py |
Updates import paths and mocks for refactored chat endpoint |
tests/unit/nilai_api/handlers/tools/test_responses_tool_router.py |
New test suite for responses-specific tool routing and execution |
tests/unit/nilai-common/test_discovery.py |
Updates import path from api_model to api_models |
tests/unit/nilai-common/test_api_model.py |
Updates import path from api_model to api_models |
tests/unit/__init__.py |
Adds test fixtures for responses API models |
tests/e2e/test_responses_http.py |
Comprehensive E2E tests for responses endpoint using HTTP client |
tests/e2e/test_responses.py |
E2E tests for responses endpoint using OpenAI client |
tests/e2e/test_chat_completions_http.py |
Updates and additions for chat completions testing |
tests/e2e/test_chat_completions.py |
Updates and additions for chat completions testing |
tests/e2e/nuc.py |
Extends NUC token expiration from 5 minutes to 1 hour |
tests/e2e/config.py |
Changes CI test model from Llama-3.2-1B to gpt-oss-20b |
packages/nilai-common/src/nilai_common/discovery.py |
Updates import path from api_model to api_models |
packages/nilai-common/src/nilai_common/api_models/responses_model.py |
New module defining responses API models |
packages/nilai-common/src/nilai_common/api_models/common_model.py |
Shared models extracted from chat_completion_model |
packages/nilai-common/src/nilai_common/api_models/chat_completion_model.py |
Refactored to use common models and remove duplicates |
packages/nilai-common/src/nilai_common/api_models/__init__.py |
Module initialization with all API model exports |
packages/nilai-common/src/nilai_common/__init__.py |
Updates imports to use new api_models module |
packages/nilai-common/pyproject.toml |
Upgrades OpenAI SDK from 1.59.9 to 1.99.2 |
nilai-attestation/src/nilai_attestation/attestation/nvtrust/nv_verifier.py |
Updates import path from api_model to api_models |
nilai-api/src/nilai_api/state.py |
Updates import path from api_model to api_models |
nilai-api/src/nilai_api/routers/private.py |
Refactored to include modular endpoint routers instead of implementing all endpoints |
nilai-api/src/nilai_api/routers/endpoints/responses.py |
New responses endpoint implementation with full feature support |
nilai-api/src/nilai_api/routers/endpoints/chat.py |
Extracted chat completions endpoint from private.py |
nilai-api/src/nilai_api/handlers/web_search.py |
Adds web search support for responses API |
nilai-api/src/nilai_api/handlers/tools/tool_router.py |
Adds unknown tool validation for chat completions |
nilai-api/src/nilai_api/handlers/tools/responses_tool_router.py |
New tool router for responses API workflow |
nilai-api/src/nilai_api/config/tools.py |
New configuration model for tool settings |
nilai-api/src/nilai_api/config/config.yaml |
Adds tools configuration and adjusts web search rate limits |
nilai-api/src/nilai_api/config/__init__.py |
Integrates tools configuration into main config |
nilai-api/pyproject.toml |
Upgrades OpenAI SDK from 1.59.9 to 1.99.2 |
docker/compose/docker-compose.gpt-20b-gpu.ci.yml |
New Docker compose configuration for GPT-20B model |
.github/workflows/cicd.yml |
Updates CI to use GPT-20B model and US-East-1 region |
scripts/wait_for_ci_services.sh |
Updates to wait for GPT-20B container instead of Llama-1B |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
02ad2e7 to
b8dfd5c
Compare
nilai-api/src/nilai_api/handlers/tools/responses_tool_router.py
Outdated
Show resolved
Hide resolved
| code = args.get("code", "") | ||
| result = await code_execution.execute_python(code) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Execute empty code? Maybe not worth even trying and excepting.
| result = await code_execution.execute_python(code) | ||
|
|
||
| output_json_string = json.dumps({"result": str(result).strip()}) | ||
| logger.info(f"[responses_tool] execute_python result: {result.strip()}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One part of being a totally private inference service is not logging everything the user does. We could add DEBUG logs to check this when testing but maybe not for prod.
| output_json_string = json.dumps( | ||
| {"error": "Invalid JSON in tool call arguments."} | ||
| ) | ||
| except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On error, it is reasonable to have logger.error
| if not tool_calls: | ||
| return first_response, prompt_tokens, completion_tokens | ||
|
|
||
| unknown = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you improve tha naming of unknown to unknown_tool or something?
| return first_response, prompt_tokens, completion_tokens | ||
|
|
||
| unknown = [ | ||
| tc for tc in tool_calls if tc["name"] not in CONFIG.tools.implemented_tools |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want implemented_tools to be a config value? Happy to debate it, but, the tools that are available are determined by the code that you implement, not by a config value. If we want to have some config it can be used to deactivate certain tools. The configuration, in any case, should not determine the implementation.
Maybe the wording is not right and should be expressed somehow else.
| unknown = [ | ||
| tc | ||
| for tc in tool_calls | ||
| if tc.function.name not in CONFIG.tools.implemented_tools | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. unknown is not a good variable name. Should be called unknown_tool_handlers or similar.
| return WebSearchEnhancedMessages(messages=req.messages, sources=all_sources) | ||
|
|
||
|
|
||
| async def enhance_input_with_web_search( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add docstrings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the function contents for these files seem a clone of the functions present in web_search.py. I would remove copied parts and try to share the common functionality among the different functions. Otherwise, a change to one section of the code involves changing 4 times the same thing. Same for the prompts. Try to have one referenced multiple times if it is possible.
| logger.info( | ||
| f"[responses] start request_id={request_id} user={auth_info.user.userid} model={model_name} stream={req.stream} web_search={bool(req.web_search)} tools={bool(req.tools)} multimodal={has_multimodal} url={model_url}" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not great having detailed logging if it is not for debug purposes
| @@ -209,7 +160,7 @@ def is_text_part(self) -> bool: | |||
|
|
|||
| def is_multimodal_part(self) -> bool: | |||
| c = self.content | |||
| if c is None: # tool calling message | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why this comment was removed? Isn't it accurate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this file require these changes?
tests/e2e/test_chat_completions.py
Outdated
| assert forbidden, "No NILDB command detected, when expected" | ||
|
|
||
|
|
||
| @pytest.mark.rerun(3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to rerun it 3 times? Does this mean it can fail 2/3 times?
| { | ||
| "role": "system", | ||
| "content": "You are a helpful assistant that provides accurate and concise information.", | ||
| "content": "You are a helpful assistant that provides accurate and concise information. You must use the get_weather tool to get the weather information.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change required, given the test is skipped?
| }, | ||
| } | ||
| ], | ||
| tool_choice={"type": "function", "function": {"name": "get_weather"}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Is this change required?
| mock_async_openai_instance = MagicMock() | ||
| mock_async_openai_instance.chat = mock_chat | ||
| mocker.patch( | ||
| "nilai_api.routers.private.AsyncOpenAI", return_value=mock_async_openai_instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More of a lot of comments deleted with no apparent reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the most part, it looks good to me. This is a huge PR though. I think it would be positive if you check the Files Changed in the PR. I think there are many of the comments that are obvious from this view if you check whether comments are removed, or docstrings are missing, or certain unused tests receive changes.
There are some very good reorganizations of the code though.
…rch handling - Removed ToolsConfig and related configurations from the project. - Updated web search logic to improve handling of multiple topic queries and responses. - Enhanced error handling and logging in tool execution workflows. - Refactored response handling to streamline the integration of web search results.
Overview
Implements the OpenAI-compatible
/v1/responsesAPI, a more flexible alternative to Chat Completions with structured input, tool calling, web search, and streaming support.Key Features
New
/v1/responsesendpointAdvanced Tool Handling
Web Search Integration
web_searchparamMultimodal Support
Architecture
Split
private.pyinto modular endpoints:/v1/chat/completions → chat.py/v1/responses → responses.pyNew
responses_tool_router.pyfor tool workflowsModular API models in
nilai-commonTechnical Highlights