feat: new /v1/responses #161

blefo · 2025-10-10T13:40:58Z

Overview

Implements the OpenAI-compatible /v1/responses API, a more flexible alternative to Chat Completions with structured input, tool calling, web search, and streaming support.

Key Features

New /v1/responses endpoint
- OpenAI-compatible; supports streaming/non-streaming
- Integrated auth, rate limits, token tracking, and response signing
Advanced Tool Handling
- Auto-detects and executes tool calls
- Multi-turn workflows with context preservation
- Supports Python execution and concurrent tools
Web Search Integration
- Optional Brave Search enrichment via web_search param
- Adds real-time context with source attribution
Multimodal Support
- Image input validation for compatible models

Architecture

Split private.py into modular endpoints:
- /v1/chat/completions → chat.py
- /v1/responses → responses.py
New responses_tool_router.py for tool workflows
Modular API models in nilai-common

Technical Highlights

Validates model capabilities (tools, multimodal, web search)
Supports NilDB prompt retrieval and signed responses
SSE streaming with token usage and attribution

…ructure - Updated OpenAI dependency to version 1.99.2 in both `pyproject.toml` files for `nilai-api` and `nilai-common`. - Enhanced response model in `responses_model.py` by adding new fields and improving type definitions. - Refactored response handling in `responses.py` to include usage tracking for input and output tokens. - Adjusted import statements in `__init__.py` to streamline model access.

- Updated return types in `route_and_execute_tool_call` and `process_tool_calls` to use `FunctionCallOutput`. - Improved error handling and logging in tool execution. - Adjusted input handling in `handle_responses_tool_workflow` to support lists of `ResponseInputParam`. - Added new imports for `FunctionCallOutput` and related types in `nilai_common` models.

- Changed `ResponseFunctionToolCall` to `ResponseFunctionToolCallParam` in multiple functions for better type consistency. - Enhanced `handle_responses_tool_workflow` to utilize new input item types and improved handling of tool call results. - Updated imports in `__init__.py` and other files to reflect new model structures.

…e tests architecture

- Changed EC2 instance type from g4dn.xlarge to g6.xlarge in the CI workflow. - Updated the docker-compose command to use the new GPT-20B configuration file. - Added a new docker-compose file for the GPT-20B GPU service, including environment settings and health checks. - Updated the CI model reference in the test configuration to use the new GPT-20B model.

…+ ci model configuration

- Changed environment setting from "ci" to "mainnet" in config.yaml. - Updated authentication strategy to "api_key". - Adjusted rate limiting parameters for web search functionality. - Refactored chat completion logic to improve handling of tool support and request parameters. - Added detailed docstrings to various classes and methods for better clarity.

Copilot

Pull Request Overview

This PR implements a new OpenAI-compatible /v1/responses API endpoint alongside architectural improvements to the codebase. The endpoint provides a more flexible interface for AI interactions with support for structured input, tool calling, web search, and streaming capabilities.

Key changes:

Adds new /v1/responses endpoint with complete feature parity to chat completions
Refactors monolithic private.py into modular endpoint files (chat.py, responses.py)
Introduces responses_tool_router.py for responses-specific tool workflow handling
Updates nilai-common package structure from api_model.py to api_models/ module with separate files for different API types

Reviewed Changes

Copilot reviewed 37 out of 39 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`tests/unit/nilai_api/test_web_search.py`	Updates import path from `api_model` to `api_models`
`tests/unit/nilai_api/routers/test_responses_private.py`	New test suite for responses endpoint covering standard requests, streaming, and tool workflows
`tests/unit/nilai_api/routers/test_nildb_endpoints.py`	Updates import paths and adds tests for responses endpoint with nilDB integration
`tests/unit/nilai_api/routers/test_chat_completions_private.py`	Updates import paths and mocks for refactored chat endpoint
`tests/unit/nilai_api/handlers/tools/test_responses_tool_router.py`	New test suite for responses-specific tool routing and execution
`tests/unit/nilai-common/test_discovery.py`	Updates import path from `api_model` to `api_models`
`tests/unit/nilai-common/test_api_model.py`	Updates import path from `api_model` to `api_models`
`tests/unit/__init__.py`	Adds test fixtures for responses API models
`tests/e2e/test_responses_http.py`	Comprehensive E2E tests for responses endpoint using HTTP client
`tests/e2e/test_responses.py`	E2E tests for responses endpoint using OpenAI client
`tests/e2e/test_chat_completions_http.py`	Updates and additions for chat completions testing
`tests/e2e/test_chat_completions.py`	Updates and additions for chat completions testing
`tests/e2e/nuc.py`	Extends NUC token expiration from 5 minutes to 1 hour
`tests/e2e/config.py`	Changes CI test model from Llama-3.2-1B to gpt-oss-20b
`packages/nilai-common/src/nilai_common/discovery.py`	Updates import path from `api_model` to `api_models`
`packages/nilai-common/src/nilai_common/api_models/responses_model.py`	New module defining responses API models
`packages/nilai-common/src/nilai_common/api_models/common_model.py`	Shared models extracted from chat_completion_model
`packages/nilai-common/src/nilai_common/api_models/chat_completion_model.py`	Refactored to use common models and remove duplicates
`packages/nilai-common/src/nilai_common/api_models/__init__.py`	Module initialization with all API model exports
`packages/nilai-common/src/nilai_common/__init__.py`	Updates imports to use new api_models module
`packages/nilai-common/pyproject.toml`	Upgrades OpenAI SDK from 1.59.9 to 1.99.2
`nilai-attestation/src/nilai_attestation/attestation/nvtrust/nv_verifier.py`	Updates import path from `api_model` to `api_models`
`nilai-api/src/nilai_api/state.py`	Updates import path from `api_model` to `api_models`
`nilai-api/src/nilai_api/routers/private.py`	Refactored to include modular endpoint routers instead of implementing all endpoints
`nilai-api/src/nilai_api/routers/endpoints/responses.py`	New responses endpoint implementation with full feature support
`nilai-api/src/nilai_api/routers/endpoints/chat.py`	Extracted chat completions endpoint from private.py
`nilai-api/src/nilai_api/handlers/web_search.py`	Adds web search support for responses API
`nilai-api/src/nilai_api/handlers/tools/tool_router.py`	Adds unknown tool validation for chat completions
`nilai-api/src/nilai_api/handlers/tools/responses_tool_router.py`	New tool router for responses API workflow
`nilai-api/src/nilai_api/config/tools.py`	New configuration model for tool settings
`nilai-api/src/nilai_api/config/config.yaml`	Adds tools configuration and adjusts web search rate limits
`nilai-api/src/nilai_api/config/__init__.py`	Integrates tools configuration into main config
`nilai-api/pyproject.toml`	Upgrades OpenAI SDK from 1.59.9 to 1.99.2
`docker/compose/docker-compose.gpt-20b-gpu.ci.yml`	New Docker compose configuration for GPT-20B model
`.github/workflows/cicd.yml`	Updates CI to use GPT-20B model and US-East-1 region
`scripts/wait_for_ci_services.sh`	Updates to wait for GPT-20B container instead of Llama-1B

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

.github/workflows/cicd.yml