-
Notifications
You must be signed in to change notification settings - Fork 660
Python: Add LiteLLM integration for chat and responses clients (#194) #1845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds LiteLLM integration to the Microsoft Agent Framework, enabling the framework to work with LiteLLM's unified interface for multiple LLM providers. The integration includes both chat completion and responses API clients with full test coverage and sample code.
- Adds a new
agent-framework-litellmpackage withLiteLlmChatClientandLiteLlmResponsesClient - Includes comprehensive unit and integration tests covering various scenarios
- Provides sample code demonstrating usage of both clients
- Updates configuration files and documentation to support the new integration
Reviewed Changes
Copilot reviewed 15 out of 17 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| python/uv.lock | Adds agent-framework-litellm package dependencies to the lockfile |
| python/pyproject.toml | Registers the new litellm package in workspace dependencies |
| python/packages/litellm/* | Core implementation files for LiteLLM chat and responses clients |
| python/packages/litellm/tests/* | Comprehensive test suite with unit and integration tests |
| python/samples/getting_started/chat_client/* | Sample code demonstrating LiteLLM client usage |
| python/.env.example | Adds LITE_LLM_MODEL_ID environment variable |
| python/packages/core/pyproject.toml | Adds litellm to the "all" extras |
| return cast(ModelResponse, lite_llm_response) | ||
|
|
||
| def lite_llm_to_openai_completion(self, response: ModelResponse) -> ChatCompletion: | ||
| # Convert a LiteLLM ResponsesAPIResponse to an OpenAI Response. LiteLLM aims to match the OpenAI A I, |
Copilot
AI
Oct 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect spacing in comment: 'OpenAI A I' should be 'OpenAI API'.
| # Convert a LiteLLM ResponsesAPIResponse to an OpenAI Response. LiteLLM aims to match the OpenAI A I, | |
| # Convert a LiteLLM ResponsesAPIResponse to an OpenAI Response. LiteLLM aims to match the OpenAI API, |
| # however in the future there may be differences that need to be accounted for here. | ||
| openai_event = cast(ChatCompletionChunk, event) | ||
|
|
||
| # LiteLLM does not providet this as a first-class field, so we map it here. |
Copilot
AI
Oct 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'providet' to 'provide'.
| # LiteLLM does not providet this as a first-class field, so we map it here. | |
| # LiteLLM does not provide this as a first-class field, so we map it here. |
| def lite_llm_to_openai_response(self, response: ModelResponse) -> ChatCompletion: | ||
| """Convert a LiteLLM ModelResponse to an OpenAI ChatCompletion.""" | ||
| # OpenAI parsing code currently directly checks for OpenAI classes. However, | ||
| # LiteLLM implements the OpenAI API via its own classes, compatable via |
Copilot
AI
Oct 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'compatable' to 'compatible'.
| # LiteLLM implements the OpenAI API via its own classes, compatable via | |
| # LiteLLM implements the OpenAI API via its own classes, compatible via |
|
|
||
| # Development Notes | ||
|
|
||
| * LiteLLM doesn't have a generic way of setting config for model id (see https://docs.litellm.ai/docs/set_keys for more information on available environment vairiables). For this reason, we've added the "LITE_LLM_MODEL" env param which may be set. The integration still supports LiteLLM's list of provider-specific environment variables. |
Copilot
AI
Oct 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'vairiables' to 'variables'.
| * LiteLLM doesn't have a generic way of setting config for model id (see https://docs.litellm.ai/docs/set_keys for more information on available environment vairiables). For this reason, we've added the "LITE_LLM_MODEL" env param which may be set. The integration still supports LiteLLM's list of provider-specific environment variables. | |
| * LiteLLM doesn't have a generic way of setting config for model id (see https://docs.litellm.ai/docs/set_keys for more information on available environment variables). For this reason, we've added the "LITE_LLM_MODEL" env param which may be set. The integration still supports LiteLLM's list of provider-specific environment variables. |
|
|
||
| # Development Notes | ||
|
|
||
| * LiteLLM doesn't have a generic way of setting config for model id (see https://docs.litellm.ai/docs/set_keys for more information on available environment vairiables). For this reason, we've added the "LITE_LLM_MODEL" env param which may be set. The integration still supports LiteLLM's list of provider-specific environment variables. |
Copilot
AI
Oct 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation mentions 'LITE_LLM_MODEL' env param, but the code and other documentation use 'LITE_LLM_MODEL_ID'. This inconsistency should be corrected to use 'LITE_LLM_MODEL_ID' throughout.
| * LiteLLM doesn't have a generic way of setting config for model id (see https://docs.litellm.ai/docs/set_keys for more information on available environment vairiables). For this reason, we've added the "LITE_LLM_MODEL" env param which may be set. The integration still supports LiteLLM's list of provider-specific environment variables. | |
| * LiteLLM doesn't have a generic way of setting config for model id (see https://docs.litellm.ai/docs/set_keys for more information on available environment vairiables). For this reason, we've added the "LITE_LLM_MODEL_ID" env param which may be set. The integration still supports LiteLLM's list of provider-specific environment variables. |
| def lite_llm_to_openai_completion(self, response: ModelResponse) -> ChatCompletion: | ||
| # Convert a LiteLLM ResponsesAPIResponse to an OpenAI Response. LiteLLM aims to match the OpenAI A I, | ||
| # however in the future there may be differences that need to be accounted for here. | ||
| return cast(ChatCompletion, response) | ||
|
|
Copilot
AI
Oct 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method lite_llm_to_openai_completion appears to be unused. Line 153 calls lite_llm_to_openai_response instead, which has the same implementation. Consider removing this duplicate method or clarifying if it's intentionally kept for future use.
| def lite_llm_to_openai_completion(self, response: ModelResponse) -> ChatCompletion: | |
| # Convert a LiteLLM ResponsesAPIResponse to an OpenAI Response. LiteLLM aims to match the OpenAI A I, | |
| # however in the future there may be differences that need to be accounted for here. | |
| return cast(ChatCompletion, response) |
| assert second_response.text is not None | ||
| # Should NOT use the weather tool since it was only run-level in previous call | ||
| # Call count should still be 1 (no additional calls) | ||
| assert call_count == 1 |
Copilot
AI
Oct 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test is always true, because of this condition.
| assert second_response.text is not None | ||
| # Should NOT use the weather tool since it was only run-level in previous call | ||
| # Call count should still be 1 (no additional calls) | ||
| assert call_count == 1 |
Copilot
AI
Oct 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test is always true, because of this condition.
| print(str(chunk), end="") | ||
| print("") | ||
| else: | ||
| response = await client.get_response(message, tools=get_weather) |
Copilot
AI
Oct 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement is unreachable.
| else: | ||
| response = await client.get_response(message, tools=[]) | ||
| print(f"Assistant: {response}") |
Copilot
AI
Oct 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement is unreachable.
| else: | |
| response = await client.get_response(message, tools=[]) | |
| print(f"Assistant: {response}") |
| from pydantic import ValidationError | ||
|
|
||
|
|
||
| class LiteLlmCompletionAISettings(AFBaseSettings): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| class LiteLlmCompletionAISettings(AFBaseSettings): | |
| class LiteLlmSettings(AFBaseSettings): |
|
|
||
| env_prefix: ClassVar[str] = "LITE_LLM_" | ||
|
|
||
| model_id: str | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are api_key and api_base not part of these settings?
|
|
||
| super().__init__(api_base=api_base, api_key=api_key, model_id=self.model_id, client=None, **kwargs) # type: ignore | ||
|
|
||
| def make_completion_streaming_request(self, **options_dict: Any) -> CustomStreamWrapper: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these should probably be internal methods:
| def make_completion_streaming_request(self, **options_dict: Any) -> CustomStreamWrapper: | |
| def _make_completion_streaming_request(self, **options_dict: Any) -> CustomStreamWrapper: |
| # Completion conditionally returns this depending on streaming vs non-streaming | ||
| return cast(CustomStreamWrapper, lite_llm_response) | ||
|
|
||
| def make_completion_nonstreaming_request(self, **options_dict: Any) -> ModelResponse: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def make_completion_nonstreaming_request(self, **options_dict: Any) -> ModelResponse: | |
| def _make_completion_nonstreaming_request(self, **options_dict: Any) -> ModelResponse: |
| # Completion conditionally returns this depending on streaming vs non-streaming | ||
| return cast(ModelResponse, lite_llm_response) | ||
|
|
||
| def lite_llm_to_openai_completion(self, response: ModelResponse) -> ChatCompletion: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def lite_llm_to_openai_completion(self, response: ModelResponse) -> ChatCompletion: | |
| def _lite_llm_to_openai_completion(self, response: ModelResponse) -> ChatCompletion: |
| openai_event.usage = event.get("usage", None) | ||
| return openai_event | ||
|
|
||
| def lite_llm_to_openai_response(self, response: ModelResponse) -> ChatCompletion: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def lite_llm_to_openai_response(self, response: ModelResponse) -> ChatCompletion: | |
| def _lite_llm_to_openai_response(self, response: ModelResponse) -> ChatCompletion: |
|
|
||
| def make_completion_streaming_request(self, **options_dict: Any) -> CustomStreamWrapper: | ||
| options_dict["model"] = options_dict.pop("model_id") | ||
| lite_llm_response = completion(stream=True, **options_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
litellm supports async, we should use that: https://github.com/BerriAI/litellm?tab=readme-ov-file#async-docs
| **kwargs: Any, | ||
| ) -> ChatResponse: | ||
| options_dict = self._prepare_options(messages, chat_options) | ||
| options_dict["model_id"] = self.model_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should already be done in the prepare_options, and people should be able to set the model at runtime.
| from pydantic import ValidationError | ||
|
|
||
|
|
||
| class LiteLlmResponsesAISettings(AFBaseSettings): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please reuse a single LiteLlmSettings object, differentiate with CHAT_MODEL_ID and RESPONSES_MODEL_ID if needed (similar to openai settings)
|
|
||
| super().__init__(api_base=api_base, api_key=api_key, model_id=self.model_id, client=None, **kwargs) # type: ignore | ||
|
|
||
| def make_responses_streaming_request(self, **options_dict: Any) -> SyncResponsesAPIStreamingIterator: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comments as _chat_client, use async and make these methods internal
Motivation and Context
Description
This change introduces an agent-framework-litellm package which extends the OpenAIBaseChatClient and OpenAIBaseResponsesClient into two new clients for access with LiteLLM. Because LiteLLM is mostly compatible with OpenAI, we begin translating the LiteLLM format into OpenAI format, then leveraging our existing OpenAI->AF translation layer to support the full translation to AF.
Note that we have noticed certain cases where LiteLLM may deviate from OpenAI, in particular to support special non-openai features (see https://docs.litellm.ai/docs/reasoning_content). This PR does not solve exhaustively for integration against all providers. The following have been manually tested:
Integration tests were written and executed against Azure OpenAI as provider, with tests copied directly from OpenAI client tests.
Two features are known to not currently be supported by this PR:
Contribution Checklist