Magic LLM

Magic LLM is a unified, simplified wrapper for connecting to a wide range of LLM providers, including:

Note: Many of these providers have been verified compatible with OpenAI's API or are supported natively. Some may require non-standard argument/credential patterns.

Supported Providers & Capabilities

Provider	Streaming	Completion	Embedding	Audio	Async Streaming	Async Completion	Function calling	Fallback	Callback
OpenAI	✅	✅	✅	✅	✅	✅	✅	✅	✅
Cloudflare	✅	✅	❌	❌	✅	✅	❌	✅	✅
AWS Bedrock	✅	✅	❌	❌	✅	✅	❌	✅	✅
Google AI Studio	✅	✅	❌	❌	✅	✅	❌	✅	✅
Cohere	✅	✅	❌	❌	✅	✅	❌	✅	✅
Anthropic	✅	✅	❌	❌	✅	✅	✅	✅	✅
Cerebras	✅	✅	❌	❌	✅	✅	✅	✅	✅
SambaNova	✅	✅	❌	❌	✅	✅	✅	✅	✅
DeepInfra	✅	✅	✅	✅	✅	✅	✅	✅	✅
Deepseek	✅	✅	❌	❌	✅	✅	✅	✅	✅
Parasail	✅	✅	❌	❌	✅	✅	✅	✅	✅
x.ai	✅	✅	❌	❌	✅	✅	✅	✅	✅
Together.AI	✅	✅	✅	❌	✅	✅	✅	✅	✅
Perplexity AI	✅	✅	❌	❌	✅	✅	✅	✅	✅
OpenRouter	✅	✅	❌	❌	✅	✅	✅	✅	✅
NovitaAI	✅	✅	❌	❌	✅	✅	❌	✅	✅
Mistral	✅	✅	✅	❌	✅	✅	✅	✅	✅
Hyperbolic	✅	✅	❌	❌	✅	✅	✅	✅	✅
Groq	✅	✅	❌	✅	✅	✅	✅	✅	✅
Fireworks.AI	✅	✅	✅	✅	✅	✅	✅	✅	✅
Azure	❓	❓	❌	✅	❓	❓	❓	✅	✅

Legend:

Streaming: Supports incremental streamed responses
Completion: Supports single result completion
Embedding: Supports text embedding generation
Audio: Supports transcription (speech-to-text)
Async: Async streaming/async completion methods available
Fallback: Can automatically fallback to alternate client
Callback: Callback support on streamed outputs
❓ = Not fully tested

Features

Purpose

Magic LLM is designed as the backend core for Magic UI, an application generator (RAG) and multivendor LLM front-end. It is not a full OpenAI client replacement, but strives for wide usability and API/response shape compatibility.

Quickstart & Usage

Install

pip install git+https://github.com/Andres77872/magic-llm.git

Basic Usage Pattern

1. Build a client for any provider

from magic_llm import MagicLLM

# Example for an OpenAI API compatible endpoint
client = MagicLLM(
    engine='openai',
    model='gpt-4o',
    private_key='sk-...',
)

Other providers (e.g. engine='anthropic', engine='google', ... ) use analogous fields, see below.

2. Compose a conversation

from magic_llm.model import ModelChat

chat = ModelChat(system="You are a helpful assistant.")
chat.add_user_message("What is the largest animal on earth?")

3. Request an answer (Non-streaming: returns full response)

response = client.llm.generate(chat)
print(response.content)
print("Prompt tokens:", response.usage.prompt_tokens)
print("Completion tokens:", response.usage.completion_tokens)

4. Streamed response (token by token, OpenAI style, sync)

for chunk in client.llm.stream_generate(chat):
    print(chunk.choices[0].delta.content or '', end='', flush=True)
print()  # Newline at end

5. Async usage (across all providers!)

import asyncio


async def main():
    async for chunk in client.llm.async_stream_generate(chat):
        print(chunk.choices[0].delta.content or '', end='', flush=True)


asyncio.run(main())

Error Handling & Fallback

If you provide an invalid model or the provider fails, MagicLLM can fallback to a second client automatically.

# Setup a fallback chain (model 'bad-model' fails, fallback is 'gpt-4o')
client_fallback = MagicLLM(engine='openai', model='gpt-4o', private_key='sk-...')
client = MagicLLM(
    engine='openai',
    model='bad-model',
    private_key='sk-...',
    fallback=client_fallback,
)

response = client.llm.generate(chat)
print(response.content)  # Uses fallback auto-magically if first fails!

Advanced: Callbacks (monitoring, live UI, logging, etc.)

Attach a callback for every final output (after fallback, if needed):

def on_chunk(msg, content, usage, model_name, meta):
    print(f"Used model {model_name}: [{usage.prompt_tokens}pt >> {usage.completion_tokens}ct] {content}")


client = MagicLLM(
    engine='openai',
    model='bad-model',
    private_key='sk-...',
    fallback=client_fallback,
    callback=on_chunk,
)

for chunk in client.llm.stream_generate(chat):
    pass  # Content handled by callback!

Audio Transcription (Speech-to-Text)

If your provider or endpoint supports OpenAI Whisper API (e.g. OpenAI, Azure, DeepInfra, Groq, Fireworks), you can transcribe audio files with a unified API:

from magic_llm.model.ModelAudio import AudioTranscriptionsRequest
from magic_llm import MagicLLM

client = MagicLLM(engine='openai', private_key='sk-...')
with open('speech.mp3', 'rb') as f:
    data = AudioTranscriptionsRequest(file=f.read(), model="whisper-1")
    response = client.llm.sync_audio_transcriptions(data)
    print(response['text'])

# Async version:
# await client.llm.async_audio_transcriptions(data)

Embeddings

Some providers support embeddings using the unified API. Example (Together.AI shown):

client = MagicLLM(engine='together.ai', private_key='sk-...', model='BAAI/bge-base-en-v1.5')
resp = client.llm.embedding(text="How much wood would a woodchuck chuck?")
print(resp)  # List[float]

Agentic Tool Workflow (Function Calling)

Use run_agentic to let the model call your Python tools iteratively until it produces a normal answer.

Callables: simplest; the function's __name__ is the tool name.
JSON tool specs: provide OpenAI-style schemas and map names to callables via tool_functions.

Minimal example (callables)

from typing import Any, List
from magic_llm import MagicLLM
from magic_llm.util import run_agentic


client = MagicLLM(engine='openai', model='gpt-4o-mini', private_key='sk-...')


def add(a: int, b: int) -> int:
    return a + b


def top_k(items: List[Any], k: int = 3) -> List[Any]:
    return list(items)[:k]


resp = run_agentic(
    client=client,
    user_input="Compute 17 + 25, then take the first 2 fruits from ['apple','banana','cherry'] and summarize.",
    system_prompt="Use tools for arithmetic and list selection before answering.",
    tools=[add, top_k],
    tool_choice="auto",
    max_iterations=4,
)

print(resp.content)

OpenAI-style tool definitions (JSON)

from magic_llm import MagicLLM
from magic_llm.util import run_agentic


client = MagicLLM(engine='openai', model='gpt-4o-mini', private_key='sk-...')


tool_specs = [
    {
        "type": "function",
        "function": {
            "name": "add",
            "description": "Add two integers",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {"type": "integer"},
                    "b": {"type": "integer"}
                },
                "required": ["a", "b"]
            }
        }
    }
]


def add(a: int, b: int) -> int:
    return a + b


resp = run_agentic(
    client=client,
    user_input="Use the add tool to add 7 and 35, then explain the result.",
    tools=tool_specs,
    tool_functions={"add": add},
)

print(resp.content)

For a complete end-to-end research + rerank demo using multiple tools (query rewriting, arXiv search, Jina reranker, and a follow-up vision analysis), see test/test_agentic_tools_workflow.py. The agent loop implementation lives in magic_llm/util/agentic.py.

Supported Provider Configurations

OpenAI (and any OpenAI-compatible API endpoint):

client = MagicLLM(
    engine='openai',
    model='gpt-4o',
    private_key='sk-...',
)

Cloudflare:

client = MagicLLM(
    engine='cloudflare',
    model='@cf/meta/llama-2-7b-chat-int8',
    private_key='api-key',
    account_id='cf-account',
)

AWS Bedrock:

client = MagicLLM(
    engine='amazon',
    model='amazon.nova-pro-v1:0',
    aws_access_key_id='AKIA....',
    aws_secret_access_key='...',
    region_name='us-east-1',
)

Google AI Studio:

client = MagicLLM(
    engine='google',
    model='gemini-1.5-flash',
    private_key='GOOG...',
)

Cohere:

client = MagicLLM(
    engine='cohere',
    model='command-light',
    private_key='cohere-...',
)

Anthropic:

client = MagicLLM(
    engine='anthropic',
    model='claude-3-haiku-20240307',
    private_key='...',
)

Other providers: use the same builder pattern, see provider docs or examples. Some models/engines require special names.

API Summary

All LLM providers share the same interface:

llm.generate(chat)
— Synchronous, whole-result
llm.stream_generate(chat)
— Synchronous streaming (token by token)
await llm.async_generate(chat)
— Async, whole-result
async for chunk in llm.async_stream_generate(chat)
— Async, token stream
llm.embedding(text=...)
— Get embeddings (supported providers only)
llm.audio_transcriptions(data)
— Speech-to-text (supported providers/models only)
All return OpenAI-compatible objects, wherever possible

Design Principles

Frictionless provider swapping: One codebase, many clouds.
Error handling: Consistent exceptions (ChatException) for failed requests.
Streaming-first: Low-latency, responsive outputs.
Callbacks & metrics: Plug in your logger/progressbar/UI easily.
Fallback: Failover to backup provider/credentials automatic.
Minimum dependency: No hard OpenAI client requirement.

Name		Name	Last commit message	Last commit date
Latest commit History 246 Commits
.idea		.idea
docs		docs
magic_llm		magic_llm
test		test
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Magic LLM

Supported Providers & Capabilities

Features

Purpose

Quickstart & Usage

Install

Basic Usage Pattern

1. Build a client for any provider

2. Compose a conversation

3. Request an answer (Non-streaming: returns full response)

4. Streamed response (token by token, OpenAI style, sync)

5. Async usage (across all providers!)

Error Handling & Fallback

Advanced: Callbacks (monitoring, live UI, logging, etc.)

Audio Transcription (Speech-to-Text)

Embeddings

Agentic Tool Workflow (Function Calling)

Minimal example (callables)

OpenAI-style tool definitions (JSON)

Supported Provider Configurations

API Summary

Design Principles

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

Andres77872/magic-llm

Folders and files

Latest commit

History

Repository files navigation

Magic LLM

Supported Providers & Capabilities

Features

Purpose

Quickstart & Usage

Install

Basic Usage Pattern

1. Build a client for any provider

2. Compose a conversation

3. Request an answer (Non-streaming: returns full response)

4. Streamed response (token by token, OpenAI style, sync)

5. Async usage (across all providers!)

Error Handling & Fallback

Advanced: Callbacks (monitoring, live UI, logging, etc.)

Audio Transcription (Speech-to-Text)

Embeddings

Agentic Tool Workflow (Function Calling)

Minimal example (callables)

OpenAI-style tool definitions (JSON)

Supported Provider Configurations

API Summary

Design Principles

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages