Skip to content

Conversation

@drandahl
Copy link

@drandahl drandahl commented Oct 31, 2025

Motivation and Context

  1. Why is this change required? Requested per Create Chat Client which used LiteLLM #194
  2. What problem does it solve? Enables easier configuration across various model providers.
  3. What scenario does it contribute to? Create Chat Client which used LiteLLM #194
  4. If it fixes an open issue, please link to the issue here. Create Chat Client which used LiteLLM #194

Description

This change introduces an agent-framework-litellm package which extends the OpenAIBaseChatClient and OpenAIBaseResponsesClient into two new clients for access with LiteLLM. Because LiteLLM is mostly compatible with OpenAI, we begin translating the LiteLLM format into OpenAI format, then leveraging our existing OpenAI->AF translation layer to support the full translation to AF.

Note that we have noticed certain cases where LiteLLM may deviate from OpenAI, in particular to support special non-openai features (see https://docs.litellm.ai/docs/reasoning_content). This PR does not solve exhaustively for integration against all providers. The following have been manually tested:

Integration tests were written and executed against Azure OpenAI as provider, with tests copied directly from OpenAI client tests.

Two features are known to not currently be supported by this PR:

  • StructuredOutput for Responses client (that requires a responses.parse API being available, which is not present through LiteLLM).
  • HostedWebSearchTool, HostedCodeInterpreterTool, HostedFileSearchTool tool calls (requires investigation, was taken as out of scope for this work). Custom tool calls are confirmed to work.

Contribution Checklist

  • [ X ] The code builds clean without any errors or warnings
  • [ X ] The PR follows the Contribution Guidelines
  • [ X ] All unit tests pass, and I have added new tests where possible
  • [ X ] Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Copilot AI review requested due to automatic review settings October 31, 2025 21:21
@markwallace-microsoft markwallace-microsoft added documentation Improvements or additions to documentation python labels Oct 31, 2025
@markwallace-microsoft
Copy link
Member

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
TOTAL11969184184% 
report-only-changed-files is enabled. No files were changed during this commit :)

Python Unit Test Overview

Tests Skipped Failures Errors Time
1531 123 💤 0 ❌ 0 🔥 34.376s ⏱️

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds LiteLLM integration to the Microsoft Agent Framework, enabling the framework to work with LiteLLM's unified interface for multiple LLM providers. The integration includes both chat completion and responses API clients with full test coverage and sample code.

  • Adds a new agent-framework-litellm package with LiteLlmChatClient and LiteLlmResponsesClient
  • Includes comprehensive unit and integration tests covering various scenarios
  • Provides sample code demonstrating usage of both clients
  • Updates configuration files and documentation to support the new integration

Reviewed Changes

Copilot reviewed 15 out of 17 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
python/uv.lock Adds agent-framework-litellm package dependencies to the lockfile
python/pyproject.toml Registers the new litellm package in workspace dependencies
python/packages/litellm/* Core implementation files for LiteLLM chat and responses clients
python/packages/litellm/tests/* Comprehensive test suite with unit and integration tests
python/samples/getting_started/chat_client/* Sample code demonstrating LiteLLM client usage
python/.env.example Adds LITE_LLM_MODEL_ID environment variable
python/packages/core/pyproject.toml Adds litellm to the "all" extras

return cast(ModelResponse, lite_llm_response)

def lite_llm_to_openai_completion(self, response: ModelResponse) -> ChatCompletion:
# Convert a LiteLLM ResponsesAPIResponse to an OpenAI Response. LiteLLM aims to match the OpenAI A I,
Copy link

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect spacing in comment: 'OpenAI A I' should be 'OpenAI API'.

Suggested change
# Convert a LiteLLM ResponsesAPIResponse to an OpenAI Response. LiteLLM aims to match the OpenAI A I,
# Convert a LiteLLM ResponsesAPIResponse to an OpenAI Response. LiteLLM aims to match the OpenAI API,

Copilot uses AI. Check for mistakes.
# however in the future there may be differences that need to be accounted for here.
openai_event = cast(ChatCompletionChunk, event)

# LiteLLM does not providet this as a first-class field, so we map it here.
Copy link

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'providet' to 'provide'.

Suggested change
# LiteLLM does not providet this as a first-class field, so we map it here.
# LiteLLM does not provide this as a first-class field, so we map it here.

Copilot uses AI. Check for mistakes.
def lite_llm_to_openai_response(self, response: ModelResponse) -> ChatCompletion:
"""Convert a LiteLLM ModelResponse to an OpenAI ChatCompletion."""
# OpenAI parsing code currently directly checks for OpenAI classes. However,
# LiteLLM implements the OpenAI API via its own classes, compatable via
Copy link

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'compatable' to 'compatible'.

Suggested change
# LiteLLM implements the OpenAI API via its own classes, compatable via
# LiteLLM implements the OpenAI API via its own classes, compatible via

Copilot uses AI. Check for mistakes.

# Development Notes

* LiteLLM doesn't have a generic way of setting config for model id (see https://docs.litellm.ai/docs/set_keys for more information on available environment vairiables). For this reason, we've added the "LITE_LLM_MODEL" env param which may be set. The integration still supports LiteLLM's list of provider-specific environment variables.
Copy link

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'vairiables' to 'variables'.

Suggested change
* LiteLLM doesn't have a generic way of setting config for model id (see https://docs.litellm.ai/docs/set_keys for more information on available environment vairiables). For this reason, we've added the "LITE_LLM_MODEL" env param which may be set. The integration still supports LiteLLM's list of provider-specific environment variables.
* LiteLLM doesn't have a generic way of setting config for model id (see https://docs.litellm.ai/docs/set_keys for more information on available environment variables). For this reason, we've added the "LITE_LLM_MODEL" env param which may be set. The integration still supports LiteLLM's list of provider-specific environment variables.

Copilot uses AI. Check for mistakes.

# Development Notes

* LiteLLM doesn't have a generic way of setting config for model id (see https://docs.litellm.ai/docs/set_keys for more information on available environment vairiables). For this reason, we've added the "LITE_LLM_MODEL" env param which may be set. The integration still supports LiteLLM's list of provider-specific environment variables.
Copy link

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation mentions 'LITE_LLM_MODEL' env param, but the code and other documentation use 'LITE_LLM_MODEL_ID'. This inconsistency should be corrected to use 'LITE_LLM_MODEL_ID' throughout.

Suggested change
* LiteLLM doesn't have a generic way of setting config for model id (see https://docs.litellm.ai/docs/set_keys for more information on available environment vairiables). For this reason, we've added the "LITE_LLM_MODEL" env param which may be set. The integration still supports LiteLLM's list of provider-specific environment variables.
* LiteLLM doesn't have a generic way of setting config for model id (see https://docs.litellm.ai/docs/set_keys for more information on available environment vairiables). For this reason, we've added the "LITE_LLM_MODEL_ID" env param which may be set. The integration still supports LiteLLM's list of provider-specific environment variables.

Copilot uses AI. Check for mistakes.
Comment on lines +115 to +119
def lite_llm_to_openai_completion(self, response: ModelResponse) -> ChatCompletion:
# Convert a LiteLLM ResponsesAPIResponse to an OpenAI Response. LiteLLM aims to match the OpenAI A I,
# however in the future there may be differences that need to be accounted for here.
return cast(ChatCompletion, response)

Copy link

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method lite_llm_to_openai_completion appears to be unused. Line 153 calls lite_llm_to_openai_response instead, which has the same implementation. Consider removing this duplicate method or clarifying if it's intentionally kept for future use.

Suggested change
def lite_llm_to_openai_completion(self, response: ModelResponse) -> ChatCompletion:
# Convert a LiteLLM ResponsesAPIResponse to an OpenAI Response. LiteLLM aims to match the OpenAI A I,
# however in the future there may be differences that need to be accounted for here.
return cast(ChatCompletion, response)

Copilot uses AI. Check for mistakes.
assert second_response.text is not None
# Should NOT use the weather tool since it was only run-level in previous call
# Call count should still be 1 (no additional calls)
assert call_count == 1
Copy link

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test is always true, because of this condition.

Copilot uses AI. Check for mistakes.
assert second_response.text is not None
# Should NOT use the weather tool since it was only run-level in previous call
# Call count should still be 1 (no additional calls)
assert call_count == 1
Copy link

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test is always true, because of this condition.

Copilot uses AI. Check for mistakes.
print(str(chunk), end="")
print("")
else:
response = await client.get_response(message, tools=get_weather)
Copy link

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement is unreachable.

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +48
else:
response = await client.get_response(message, tools=[])
print(f"Assistant: {response}")
Copy link

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement is unreachable.

Suggested change
else:
response = await client.get_response(message, tools=[])
print(f"Assistant: {response}")

Copilot uses AI. Check for mistakes.
from pydantic import ValidationError


class LiteLlmCompletionAISettings(AFBaseSettings):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class LiteLlmCompletionAISettings(AFBaseSettings):
class LiteLlmSettings(AFBaseSettings):


env_prefix: ClassVar[str] = "LITE_LLM_"

model_id: str | None = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are api_key and api_base not part of these settings?


super().__init__(api_base=api_base, api_key=api_key, model_id=self.model_id, client=None, **kwargs) # type: ignore

def make_completion_streaming_request(self, **options_dict: Any) -> CustomStreamWrapper:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should probably be internal methods:

Suggested change
def make_completion_streaming_request(self, **options_dict: Any) -> CustomStreamWrapper:
def _make_completion_streaming_request(self, **options_dict: Any) -> CustomStreamWrapper:

# Completion conditionally returns this depending on streaming vs non-streaming
return cast(CustomStreamWrapper, lite_llm_response)

def make_completion_nonstreaming_request(self, **options_dict: Any) -> ModelResponse:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def make_completion_nonstreaming_request(self, **options_dict: Any) -> ModelResponse:
def _make_completion_nonstreaming_request(self, **options_dict: Any) -> ModelResponse:

# Completion conditionally returns this depending on streaming vs non-streaming
return cast(ModelResponse, lite_llm_response)

def lite_llm_to_openai_completion(self, response: ModelResponse) -> ChatCompletion:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def lite_llm_to_openai_completion(self, response: ModelResponse) -> ChatCompletion:
def _lite_llm_to_openai_completion(self, response: ModelResponse) -> ChatCompletion:

openai_event.usage = event.get("usage", None)
return openai_event

def lite_llm_to_openai_response(self, response: ModelResponse) -> ChatCompletion:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def lite_llm_to_openai_response(self, response: ModelResponse) -> ChatCompletion:
def _lite_llm_to_openai_response(self, response: ModelResponse) -> ChatCompletion:


def make_completion_streaming_request(self, **options_dict: Any) -> CustomStreamWrapper:
options_dict["model"] = options_dict.pop("model_id")
lite_llm_response = completion(stream=True, **options_dict)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

litellm supports async, we should use that: https://github.com/BerriAI/litellm?tab=readme-ov-file#async-docs

**kwargs: Any,
) -> ChatResponse:
options_dict = self._prepare_options(messages, chat_options)
options_dict["model_id"] = self.model_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should already be done in the prepare_options, and people should be able to set the model at runtime.

from pydantic import ValidationError


class LiteLlmResponsesAISettings(AFBaseSettings):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please reuse a single LiteLlmSettings object, differentiate with CHAT_MODEL_ID and RESPONSES_MODEL_ID if needed (similar to openai settings)


super().__init__(api_base=api_base, api_key=api_key, model_id=self.model_id, client=None, **kwargs) # type: ignore

def make_responses_streaming_request(self, **options_dict: Any) -> SyncResponsesAPIStreamingIterator:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comments as _chat_client, use async and make these methods internal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants