Magic LLM is a unified, simplified wrapper for connecting to a wide range of LLM providers, including:
- OpenAI
- Cloudflare
- AWS Bedrock
- Google AI Studio
- Cohere
- Anthropic
- Cerebras
- SambaNova
- DeepInfra
- Deepseek
- Parasail
- x.ai (Grok)
- Together.AI
- OpenRouter
- NovitaAI
- Mistral
- Hyperbolic
- Groq
- Fireworks.AI
- Perplexity AI
- Azure
Note: Many of these providers have been verified compatible with OpenAI's API or are supported natively. Some may require non-standard argument/credential patterns.
| Provider | Streaming | Completion | Embedding | Audio | Async Streaming | Async Completion | Function calling | Fallback | Callback |
|---|---|---|---|---|---|---|---|---|---|
| OpenAI | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Cloudflare | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ | ✅ | ✅ |
| AWS Bedrock | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ | ✅ | ✅ |
| Google AI Studio | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ | ✅ | ✅ |
| Cohere | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ | ✅ | ✅ |
| Anthropic | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Cerebras | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ |
| SambaNova | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ |
| DeepInfra | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Deepseek | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Parasail | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ |
| x.ai | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Together.AI | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Perplexity AI | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ |
| OpenRouter | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ |
| NovitaAI | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ | ✅ | ✅ |
| Mistral | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Hyperbolic | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Groq | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Fireworks.AI | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Azure | ❓ | ❓ | ❌ | ✅ | ❓ | ❓ | ❓ | ✅ | ✅ |
Legend:
- Streaming: Supports incremental streamed responses
- Completion: Supports single result completion
- Embedding: Supports text embedding generation
- Audio: Supports transcription (speech-to-text)
- Async: Async streaming/async completion methods available
- Fallback: Can automatically fallback to alternate client
- Callback: Callback support on streamed outputs
- ❓ = Not fully tested
- Streamed chat completion (sync & async)
- Non-stream (single result) completion (sync & async)
- Usage metrics for every call (tokens, latency, etc.)
- Callback hook on streamed outputs (for logging, progress bar, etc.)
- Fallback support: transparently try alternate client on error
- Unified error handling (raises
ChatException) - Embedding
- Audio transcription (speech-to-text, for compatible models/providers)
- Return types compatible with OpenAI client (for easy integration)
- Vision models (OpenAI/Anthropic; partial Google support)
- Function calling (tested with OpenAI)
- Text-to-Speech (currently: OpenAI only)
- Vision adapter for Google AI Studio
- More TTS/vision support in other providers
Magic LLM is designed as the backend core for Magic UI, an application generator (RAG) and multivendor LLM front-end. It is not a full OpenAI client replacement, but strives for wide usability and API/response shape compatibility.
pip install git+https://github.com/Andres77872/magic-llm.gitfrom magic_llm import MagicLLM
# Example for an OpenAI API compatible endpoint
client = MagicLLM(
engine='openai',
model='gpt-4o',
private_key='sk-...',
)Other providers (e.g. engine='anthropic', engine='google', ... ) use analogous fields, see below.
from magic_llm.model import ModelChat
chat = ModelChat(system="You are a helpful assistant.")
chat.add_user_message("What is the largest animal on earth?")response = client.llm.generate(chat)
print(response.content)
print("Prompt tokens:", response.usage.prompt_tokens)
print("Completion tokens:", response.usage.completion_tokens)for chunk in client.llm.stream_generate(chat):
print(chunk.choices[0].delta.content or '', end='', flush=True)
print() # Newline at endimport asyncio
async def main():
async for chunk in client.llm.async_stream_generate(chat):
print(chunk.choices[0].delta.content or '', end='', flush=True)
asyncio.run(main())If you provide an invalid model or the provider fails, MagicLLM can fallback to a second client automatically.
# Setup a fallback chain (model 'bad-model' fails, fallback is 'gpt-4o')
client_fallback = MagicLLM(engine='openai', model='gpt-4o', private_key='sk-...')
client = MagicLLM(
engine='openai',
model='bad-model',
private_key='sk-...',
fallback=client_fallback,
)
response = client.llm.generate(chat)
print(response.content) # Uses fallback auto-magically if first fails!Attach a callback for every final output (after fallback, if needed):
def on_chunk(msg, content, usage, model_name, meta):
print(f"Used model {model_name}: [{usage.prompt_tokens}pt >> {usage.completion_tokens}ct] {content}")
client = MagicLLM(
engine='openai',
model='bad-model',
private_key='sk-...',
fallback=client_fallback,
callback=on_chunk,
)
for chunk in client.llm.stream_generate(chat):
pass # Content handled by callback!If your provider or endpoint supports OpenAI Whisper API (e.g. OpenAI, Azure, DeepInfra, Groq, Fireworks), you can transcribe audio files with a unified API:
from magic_llm.model.ModelAudio import AudioTranscriptionsRequest
from magic_llm import MagicLLM
client = MagicLLM(engine='openai', private_key='sk-...')
with open('speech.mp3', 'rb') as f:
data = AudioTranscriptionsRequest(file=f.read(), model="whisper-1")
response = client.llm.sync_audio_transcriptions(data)
print(response['text'])
# Async version:
# await client.llm.async_audio_transcriptions(data)Some providers support embeddings using the unified API. Example (Together.AI shown):
client = MagicLLM(engine='together.ai', private_key='sk-...', model='BAAI/bge-base-en-v1.5')
resp = client.llm.embedding(text="How much wood would a woodchuck chuck?")
print(resp) # List[float]Use run_agentic to let the model call your Python tools iteratively until it produces a normal answer.
- Callables: simplest; the function's
__name__is the tool name. - JSON tool specs: provide OpenAI-style schemas and map names to callables via
tool_functions.
from typing import Any, List
from magic_llm import MagicLLM
from magic_llm.util import run_agentic
client = MagicLLM(engine='openai', model='gpt-4o-mini', private_key='sk-...')
def add(a: int, b: int) -> int:
return a + b
def top_k(items: List[Any], k: int = 3) -> List[Any]:
return list(items)[:k]
resp = run_agentic(
client=client,
user_input="Compute 17 + 25, then take the first 2 fruits from ['apple','banana','cherry'] and summarize.",
system_prompt="Use tools for arithmetic and list selection before answering.",
tools=[add, top_k],
tool_choice="auto",
max_iterations=4,
)
print(resp.content)from magic_llm import MagicLLM
from magic_llm.util import run_agentic
client = MagicLLM(engine='openai', model='gpt-4o-mini', private_key='sk-...')
tool_specs = [
{
"type": "function",
"function": {
"name": "add",
"description": "Add two integers",
"parameters": {
"type": "object",
"properties": {
"a": {"type": "integer"},
"b": {"type": "integer"}
},
"required": ["a", "b"]
}
}
}
]
def add(a: int, b: int) -> int:
return a + b
resp = run_agentic(
client=client,
user_input="Use the add tool to add 7 and 35, then explain the result.",
tools=tool_specs,
tool_functions={"add": add},
)
print(resp.content)For a complete end-to-end research + rerank demo using multiple tools (query rewriting, arXiv search, Jina reranker, and a follow-up vision analysis), see test/test_agentic_tools_workflow.py. The agent loop implementation lives in magic_llm/util/agentic.py.
OpenAI (and any OpenAI-compatible API endpoint):
client = MagicLLM(
engine='openai',
model='gpt-4o',
private_key='sk-...',
)Cloudflare:
client = MagicLLM(
engine='cloudflare',
model='@cf/meta/llama-2-7b-chat-int8',
private_key='api-key',
account_id='cf-account',
)AWS Bedrock:
client = MagicLLM(
engine='amazon',
model='amazon.nova-pro-v1:0',
aws_access_key_id='AKIA....',
aws_secret_access_key='...',
region_name='us-east-1',
)Google AI Studio:
client = MagicLLM(
engine='google',
model='gemini-1.5-flash',
private_key='GOOG...',
)Cohere:
client = MagicLLM(
engine='cohere',
model='command-light',
private_key='cohere-...',
)Anthropic:
client = MagicLLM(
engine='anthropic',
model='claude-3-haiku-20240307',
private_key='...',
)Other providers: use the same builder pattern, see provider docs or examples. Some models/engines require special names.
All LLM providers share the same interface:
llm.generate(chat)
— Synchronous, whole-resultllm.stream_generate(chat)
— Synchronous streaming (token by token)await llm.async_generate(chat)
— Async, whole-resultasync for chunk in llm.async_stream_generate(chat)
— Async, token streamllm.embedding(text=...)
— Get embeddings (supported providers only)llm.audio_transcriptions(data)
— Speech-to-text (supported providers/models only)- All return OpenAI-compatible objects, wherever possible
- Frictionless provider swapping: One codebase, many clouds.
- Error handling: Consistent exceptions (
ChatException) for failed requests. - Streaming-first: Low-latency, responsive outputs.
- Callbacks & metrics: Plug in your logger/progressbar/UI easily.
- Fallback: Failover to backup provider/credentials automatic.
- Minimum dependency: No hard OpenAI client requirement.