Skip to content

Conversation

@blefo
Copy link
Member

@blefo blefo commented Oct 10, 2025

Overview

Implements the OpenAI-compatible /v1/responses API, a more flexible alternative to Chat Completions with structured input, tool calling, web search, and streaming support.

Key Features

  • New /v1/responses endpoint

    • OpenAI-compatible; supports streaming/non-streaming
    • Integrated auth, rate limits, token tracking, and response signing
  • Advanced Tool Handling

    • Auto-detects and executes tool calls
    • Multi-turn workflows with context preservation
    • Supports Python execution and concurrent tools
  • Web Search Integration

    • Optional Brave Search enrichment via web_search param
    • Adds real-time context with source attribution
  • Multimodal Support

    • Image input validation for compatible models

Architecture

  • Split private.py into modular endpoints:

    • /v1/chat/completions → chat.py
    • /v1/responses → responses.py
  • New responses_tool_router.py for tool workflows

  • Modular API models in nilai-common

Technical Highlights

  • Validates model capabilities (tools, multimodal, web search)
  • Supports NilDB prompt retrieval and signed responses
  • SSE streaming with token usage and attribution

blefo added 16 commits October 7, 2025 10:35
…ructure

- Updated OpenAI dependency to version 1.99.2 in both `pyproject.toml` files for `nilai-api` and `nilai-common`.
- Enhanced response model in `responses_model.py` by adding new fields and improving type definitions.
- Refactored response handling in `responses.py` to include usage tracking for input and output tokens.
- Adjusted import statements in `__init__.py` to streamline model access.
- Updated return types in `route_and_execute_tool_call` and `process_tool_calls` to use `FunctionCallOutput`.
- Improved error handling and logging in tool execution.
- Adjusted input handling in `handle_responses_tool_workflow` to support lists of `ResponseInputParam`.
- Added new imports for `FunctionCallOutput` and related types in `nilai_common` models.
- Changed `ResponseFunctionToolCall` to `ResponseFunctionToolCallParam` in multiple functions for better type consistency.
- Enhanced `handle_responses_tool_workflow` to utilize new input item types and improved handling of tool call results.
- Updated imports in `__init__.py` and other files to reflect new model structures.
- Changed EC2 instance type from g4dn.xlarge to g6.xlarge in the CI workflow.
- Updated the docker-compose command to use the new GPT-20B configuration file.
- Added a new docker-compose file for the GPT-20B GPU service, including environment settings and health checks.
- Updated the CI model reference in the test configuration to use the new GPT-20B model.
@blefo blefo force-pushed the feat-responses-migration branch from d06c64d to d0c5385 Compare October 23, 2025 07:54
- Changed environment setting from "ci" to "mainnet" in config.yaml.
- Updated authentication strategy to "api_key".
- Adjusted rate limiting parameters for web search functionality.
- Refactored chat completion logic to improve handling of tool support and request parameters.
- Added detailed docstrings to various classes and methods for better clarity.
@blefo blefo requested a review from Copilot October 23, 2025 11:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a new OpenAI-compatible /v1/responses API endpoint alongside architectural improvements to the codebase. The endpoint provides a more flexible interface for AI interactions with support for structured input, tool calling, web search, and streaming capabilities.

Key changes:

  • Adds new /v1/responses endpoint with complete feature parity to chat completions
  • Refactors monolithic private.py into modular endpoint files (chat.py, responses.py)
  • Introduces responses_tool_router.py for responses-specific tool workflow handling
  • Updates nilai-common package structure from api_model.py to api_models/ module with separate files for different API types

Reviewed Changes

Copilot reviewed 37 out of 39 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/unit/nilai_api/test_web_search.py Updates import path from api_model to api_models
tests/unit/nilai_api/routers/test_responses_private.py New test suite for responses endpoint covering standard requests, streaming, and tool workflows
tests/unit/nilai_api/routers/test_nildb_endpoints.py Updates import paths and adds tests for responses endpoint with nilDB integration
tests/unit/nilai_api/routers/test_chat_completions_private.py Updates import paths and mocks for refactored chat endpoint
tests/unit/nilai_api/handlers/tools/test_responses_tool_router.py New test suite for responses-specific tool routing and execution
tests/unit/nilai-common/test_discovery.py Updates import path from api_model to api_models
tests/unit/nilai-common/test_api_model.py Updates import path from api_model to api_models
tests/unit/__init__.py Adds test fixtures for responses API models
tests/e2e/test_responses_http.py Comprehensive E2E tests for responses endpoint using HTTP client
tests/e2e/test_responses.py E2E tests for responses endpoint using OpenAI client
tests/e2e/test_chat_completions_http.py Updates and additions for chat completions testing
tests/e2e/test_chat_completions.py Updates and additions for chat completions testing
tests/e2e/nuc.py Extends NUC token expiration from 5 minutes to 1 hour
tests/e2e/config.py Changes CI test model from Llama-3.2-1B to gpt-oss-20b
packages/nilai-common/src/nilai_common/discovery.py Updates import path from api_model to api_models
packages/nilai-common/src/nilai_common/api_models/responses_model.py New module defining responses API models
packages/nilai-common/src/nilai_common/api_models/common_model.py Shared models extracted from chat_completion_model
packages/nilai-common/src/nilai_common/api_models/chat_completion_model.py Refactored to use common models and remove duplicates
packages/nilai-common/src/nilai_common/api_models/__init__.py Module initialization with all API model exports
packages/nilai-common/src/nilai_common/__init__.py Updates imports to use new api_models module
packages/nilai-common/pyproject.toml Upgrades OpenAI SDK from 1.59.9 to 1.99.2
nilai-attestation/src/nilai_attestation/attestation/nvtrust/nv_verifier.py Updates import path from api_model to api_models
nilai-api/src/nilai_api/state.py Updates import path from api_model to api_models
nilai-api/src/nilai_api/routers/private.py Refactored to include modular endpoint routers instead of implementing all endpoints
nilai-api/src/nilai_api/routers/endpoints/responses.py New responses endpoint implementation with full feature support
nilai-api/src/nilai_api/routers/endpoints/chat.py Extracted chat completions endpoint from private.py
nilai-api/src/nilai_api/handlers/web_search.py Adds web search support for responses API
nilai-api/src/nilai_api/handlers/tools/tool_router.py Adds unknown tool validation for chat completions
nilai-api/src/nilai_api/handlers/tools/responses_tool_router.py New tool router for responses API workflow
nilai-api/src/nilai_api/config/tools.py New configuration model for tool settings
nilai-api/src/nilai_api/config/config.yaml Adds tools configuration and adjusts web search rate limits
nilai-api/src/nilai_api/config/__init__.py Integrates tools configuration into main config
nilai-api/pyproject.toml Upgrades OpenAI SDK from 1.59.9 to 1.99.2
docker/compose/docker-compose.gpt-20b-gpu.ci.yml New Docker compose configuration for GPT-20B model
.github/workflows/cicd.yml Updates CI to use GPT-20B model and US-East-1 region
scripts/wait_for_ci_services.sh Updates to wait for GPT-20B container instead of Llama-1B

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@blefo blefo force-pushed the feat-responses-migration branch from 02ad2e7 to b8dfd5c Compare October 23, 2025 12:06
@blefo blefo requested a review from jcabrero October 24, 2025 07:57
@blefo blefo marked this pull request as ready for review October 24, 2025 07:58
Comment on lines 46 to 47
code = args.get("code", "")
result = await code_execution.execute_python(code)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Execute empty code? Maybe not worth even trying and excepting.

result = await code_execution.execute_python(code)

output_json_string = json.dumps({"result": str(result).strip()})
logger.info(f"[responses_tool] execute_python result: {result.strip()}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One part of being a totally private inference service is not logging everything the user does. We could add DEBUG logs to check this when testing but maybe not for prod.

output_json_string = json.dumps(
{"error": "Invalid JSON in tool call arguments."}
)
except Exception as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On error, it is reasonable to have logger.error

if not tool_calls:
return first_response, prompt_tokens, completion_tokens

unknown = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you improve tha naming of unknown to unknown_tool or something?

return first_response, prompt_tokens, completion_tokens

unknown = [
tc for tc in tool_calls if tc["name"] not in CONFIG.tools.implemented_tools
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want implemented_tools to be a config value? Happy to debate it, but, the tools that are available are determined by the code that you implement, not by a config value. If we want to have some config it can be used to deactivate certain tools. The configuration, in any case, should not determine the implementation.

Maybe the wording is not right and should be expressed somehow else.

Comment on lines 166 to 170
unknown = [
tc
for tc in tool_calls
if tc.function.name not in CONFIG.tools.implemented_tools
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. unknown is not a good variable name. Should be called unknown_tool_handlers or similar.

return WebSearchEnhancedMessages(messages=req.messages, sources=all_sources)


async def enhance_input_with_web_search(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add docstrings

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the function contents for these files seem a clone of the functions present in web_search.py. I would remove copied parts and try to share the common functionality among the different functions. Otherwise, a change to one section of the code involves changing 4 times the same thing. Same for the prompts. Try to have one referenced multiple times if it is possible.

Comment on lines 154 to 156
logger.info(
f"[responses] start request_id={request_id} user={auth_info.user.userid} model={model_name} stream={req.stream} web_search={bool(req.web_search)} tools={bool(req.tools)} multimodal={has_multimodal} url={model_url}"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not great having detailed logging if it is not for debug purposes

@@ -209,7 +160,7 @@ def is_text_part(self) -> bool:

def is_multimodal_part(self) -> bool:
c = self.content
if c is None: # tool calling message
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why this comment was removed? Isn't it accurate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this file require these changes?

assert forbidden, "No NILDB command detected, when expected"


@pytest.mark.rerun(3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to rerun it 3 times? Does this mean it can fail 2/3 times?

{
"role": "system",
"content": "You are a helpful assistant that provides accurate and concise information.",
"content": "You are a helpful assistant that provides accurate and concise information. You must use the get_weather tool to get the weather information.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change required, given the test is skipped?

},
}
],
tool_choice={"type": "function", "function": {"name": "get_weather"}},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Is this change required?

mock_async_openai_instance = MagicMock()
mock_async_openai_instance.chat = mock_chat
mocker.patch(
"nilai_api.routers.private.AsyncOpenAI", return_value=mock_async_openai_instance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of a lot of comments deleted with no apparent reason.

Copy link
Member

@jcabrero jcabrero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the most part, it looks good to me. This is a huge PR though. I think it would be positive if you check the Files Changed in the PR. I think there are many of the comments that are obvious from this view if you check whether comments are removed, or docstrings are missing, or certain unused tests receive changes.

There are some very good reorganizations of the code though.

blefo added 3 commits October 28, 2025 12:12
…rch handling

- Removed ToolsConfig and related configurations from the project.
- Updated web search logic to improve handling of multiple topic queries and responses.
- Enhanced error handling and logging in tool execution workflows.
- Refactored response handling to streamline the integration of web search results.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants