Skip to content

Andres77872/magic-llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ask DeepWiki

Magic LLM

Magic LLM is a unified, simplified wrapper for connecting to a wide range of LLM providers, including:

Note: Many of these providers have been verified compatible with OpenAI's API or are supported natively. Some may require non-standard argument/credential patterns.


Supported Providers & Capabilities

Provider Streaming Completion Embedding Audio Async Streaming Async Completion Function calling Fallback Callback
OpenAI
Cloudflare
AWS Bedrock
Google AI Studio
Cohere
Anthropic
Cerebras
SambaNova
DeepInfra
Deepseek
Parasail
x.ai
Together.AI
Perplexity AI
OpenRouter
NovitaAI
Mistral
Hyperbolic
Groq
Fireworks.AI
Azure

Legend:

  • Streaming: Supports incremental streamed responses
  • Completion: Supports single result completion
  • Embedding: Supports text embedding generation
  • Audio: Supports transcription (speech-to-text)
  • Async: Async streaming/async completion methods available
  • Fallback: Can automatically fallback to alternate client
  • Callback: Callback support on streamed outputs
  • ❓ = Not fully tested

Features

  • Streamed chat completion (sync & async)
  • Non-stream (single result) completion (sync & async)
  • Usage metrics for every call (tokens, latency, etc.)
  • Callback hook on streamed outputs (for logging, progress bar, etc.)
  • Fallback support: transparently try alternate client on error
  • Unified error handling (raises ChatException)
  • Embedding
  • Audio transcription (speech-to-text, for compatible models/providers)
  • Return types compatible with OpenAI client (for easy integration)
  • Vision models (OpenAI/Anthropic; partial Google support)
  • Function calling (tested with OpenAI)
  • Text-to-Speech (currently: OpenAI only)
  • Vision adapter for Google AI Studio
  • More TTS/vision support in other providers

Purpose

Magic LLM is designed as the backend core for Magic UI, an application generator (RAG) and multivendor LLM front-end. It is not a full OpenAI client replacement, but strives for wide usability and API/response shape compatibility.


Quickstart & Usage

Install

pip install git+https://github.com/Andres77872/magic-llm.git

Basic Usage Pattern

1. Build a client for any provider

from magic_llm import MagicLLM

# Example for an OpenAI API compatible endpoint
client = MagicLLM(
    engine='openai',
    model='gpt-4o',
    private_key='sk-...',
)

Other providers (e.g. engine='anthropic', engine='google', ... ) use analogous fields, see below.

2. Compose a conversation

from magic_llm.model import ModelChat

chat = ModelChat(system="You are a helpful assistant.")
chat.add_user_message("What is the largest animal on earth?")

3. Request an answer (Non-streaming: returns full response)

response = client.llm.generate(chat)
print(response.content)
print("Prompt tokens:", response.usage.prompt_tokens)
print("Completion tokens:", response.usage.completion_tokens)

4. Streamed response (token by token, OpenAI style, sync)

for chunk in client.llm.stream_generate(chat):
    print(chunk.choices[0].delta.content or '', end='', flush=True)
print()  # Newline at end

5. Async usage (across all providers!)

import asyncio


async def main():
    async for chunk in client.llm.async_stream_generate(chat):
        print(chunk.choices[0].delta.content or '', end='', flush=True)


asyncio.run(main())

Error Handling & Fallback

If you provide an invalid model or the provider fails, MagicLLM can fallback to a second client automatically.

# Setup a fallback chain (model 'bad-model' fails, fallback is 'gpt-4o')
client_fallback = MagicLLM(engine='openai', model='gpt-4o', private_key='sk-...')
client = MagicLLM(
    engine='openai',
    model='bad-model',
    private_key='sk-...',
    fallback=client_fallback,
)

response = client.llm.generate(chat)
print(response.content)  # Uses fallback auto-magically if first fails!

Advanced: Callbacks (monitoring, live UI, logging, etc.)

Attach a callback for every final output (after fallback, if needed):

def on_chunk(msg, content, usage, model_name, meta):
    print(f"Used model {model_name}: [{usage.prompt_tokens}pt >> {usage.completion_tokens}ct] {content}")


client = MagicLLM(
    engine='openai',
    model='bad-model',
    private_key='sk-...',
    fallback=client_fallback,
    callback=on_chunk,
)

for chunk in client.llm.stream_generate(chat):
    pass  # Content handled by callback!

Audio Transcription (Speech-to-Text)

If your provider or endpoint supports OpenAI Whisper API (e.g. OpenAI, Azure, DeepInfra, Groq, Fireworks), you can transcribe audio files with a unified API:

from magic_llm.model.ModelAudio import AudioTranscriptionsRequest
from magic_llm import MagicLLM

client = MagicLLM(engine='openai', private_key='sk-...')
with open('speech.mp3', 'rb') as f:
    data = AudioTranscriptionsRequest(file=f.read(), model="whisper-1")
    response = client.llm.sync_audio_transcriptions(data)
    print(response['text'])

# Async version:
# await client.llm.async_audio_transcriptions(data)

Embeddings

Some providers support embeddings using the unified API. Example (Together.AI shown):

client = MagicLLM(engine='together.ai', private_key='sk-...', model='BAAI/bge-base-en-v1.5')
resp = client.llm.embedding(text="How much wood would a woodchuck chuck?")
print(resp)  # List[float]

Agentic Tool Workflow (Function Calling)

Use run_agentic to let the model call your Python tools iteratively until it produces a normal answer.

  • Callables: simplest; the function's __name__ is the tool name.
  • JSON tool specs: provide OpenAI-style schemas and map names to callables via tool_functions.

Minimal example (callables)

from typing import Any, List
from magic_llm import MagicLLM
from magic_llm.util import run_agentic


client = MagicLLM(engine='openai', model='gpt-4o-mini', private_key='sk-...')


def add(a: int, b: int) -> int:
    return a + b


def top_k(items: List[Any], k: int = 3) -> List[Any]:
    return list(items)[:k]


resp = run_agentic(
    client=client,
    user_input="Compute 17 + 25, then take the first 2 fruits from ['apple','banana','cherry'] and summarize.",
    system_prompt="Use tools for arithmetic and list selection before answering.",
    tools=[add, top_k],
    tool_choice="auto",
    max_iterations=4,
)

print(resp.content)

OpenAI-style tool definitions (JSON)

from magic_llm import MagicLLM
from magic_llm.util import run_agentic


client = MagicLLM(engine='openai', model='gpt-4o-mini', private_key='sk-...')


tool_specs = [
    {
        "type": "function",
        "function": {
            "name": "add",
            "description": "Add two integers",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {"type": "integer"},
                    "b": {"type": "integer"}
                },
                "required": ["a", "b"]
            }
        }
    }
]


def add(a: int, b: int) -> int:
    return a + b


resp = run_agentic(
    client=client,
    user_input="Use the add tool to add 7 and 35, then explain the result.",
    tools=tool_specs,
    tool_functions={"add": add},
)

print(resp.content)

For a complete end-to-end research + rerank demo using multiple tools (query rewriting, arXiv search, Jina reranker, and a follow-up vision analysis), see test/test_agentic_tools_workflow.py. The agent loop implementation lives in magic_llm/util/agentic.py.


Supported Provider Configurations

OpenAI (and any OpenAI-compatible API endpoint):

client = MagicLLM(
    engine='openai',
    model='gpt-4o',
    private_key='sk-...',
)

Cloudflare:

client = MagicLLM(
    engine='cloudflare',
    model='@cf/meta/llama-2-7b-chat-int8',
    private_key='api-key',
    account_id='cf-account',
)

AWS Bedrock:

client = MagicLLM(
    engine='amazon',
    model='amazon.nova-pro-v1:0',
    aws_access_key_id='AKIA....',
    aws_secret_access_key='...',
    region_name='us-east-1',
)

Google AI Studio:

client = MagicLLM(
    engine='google',
    model='gemini-1.5-flash',
    private_key='GOOG...',
)

Cohere:

client = MagicLLM(
    engine='cohere',
    model='command-light',
    private_key='cohere-...',
)

Anthropic:

client = MagicLLM(
    engine='anthropic',
    model='claude-3-haiku-20240307',
    private_key='...',
)

Other providers: use the same builder pattern, see provider docs or examples. Some models/engines require special names.


API Summary

All LLM providers share the same interface:

  • llm.generate(chat)
    — Synchronous, whole-result
  • llm.stream_generate(chat)
    — Synchronous streaming (token by token)
  • await llm.async_generate(chat)
    — Async, whole-result
  • async for chunk in llm.async_stream_generate(chat)
    — Async, token stream
  • llm.embedding(text=...)
    — Get embeddings (supported providers only)
  • llm.audio_transcriptions(data)
    — Speech-to-text (supported providers/models only)
  • All return OpenAI-compatible objects, wherever possible

Design Principles

  • Frictionless provider swapping: One codebase, many clouds.
  • Error handling: Consistent exceptions (ChatException) for failed requests.
  • Streaming-first: Low-latency, responsive outputs.
  • Callbacks & metrics: Plug in your logger/progressbar/UI easily.
  • Fallback: Failover to backup provider/credentials automatic.
  • Minimum dependency: No hard OpenAI client requirement.

About

Simple llm router

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages