Skip to content

Latest commit

 

History

History
224 lines (179 loc) · 6.09 KB

File metadata and controls

224 lines (179 loc) · 6.09 KB

image

Model Inference Examples

This directory contains examples of how to perform model inference using different SDKs and model types. The examples demonstrate both streaming and non-streaming modes, as well as tool calling capabilities.

Supported SDKs

  1. Clarifai SDK
  2. OpenAI Client
  3. LiteLLM

Installation

# Install Clarifai SDK
pip install clarifai

# Install OpenAI SDK
pip install openai

# Install LiteLLM
pip install litellm

Environment Setup

Set your Clarifai Personal Access Token (PAT) as an environment variable:

export CLARIFAI_PAT=your_pat_here

Model Types and Examples

1. LLMs (e.g., QwQ-32B-AWQ)

Clarifai SDK

  • clarifai_llm.py: Basic inference
  • clarifai_llm_stream.py: Streaming inference
  • clarifai_llm_tools.py: Tool calling example
  • clarifai_llm_async_predict.py: Asynchronous predict example
  • clarifai_llm_async_generate.py: Asynchronous generate example

OpenAI Client

  • openai_llm.py: Basic inference
  • openai_llm_stream.py: Streaming inference
  • openai_llm_tools.py: Tool calling example

LiteLLM SDK

  • litellm_llm.py: Basic inference
  • litellm_llm_stream.py: Streaming inference
  • litellm_llm_tools.py: Tool calling example

2. Multimodal Models (e.g., GPT-4_1)

Clarifai SDK

  • clarifai_multimodal.py: Basic inference
  • clarifai_multimodal_stream.py: Streaming inference
  • clarifai_multimodal_tools.py: Tool calling example

OpenAI Client

  • openai_multimodal.py: Basic inference
  • openai_multimodal_stream.py: Streaming inference
  • openai_multimodal_tools.py: Tool calling example

LiteLLM SDK

  • litellm_multimodal.py: Basic inference
  • litellm_multimodal_stream.py: Streaming inference
  • litellm_multimodal_tools.py: Tool calling example

Usage Examples

Basic Inference

# Using Clarifai SDK
from clarifai.client import Model

model = Model(url="https://clarifai.com/qwen/qwenLM/models/QwQ-32B-AWQ")
response = model.predict("What is the capital of France?")
print(response)

# Using OpenAI Client
from openai import OpenAI

client = OpenAI(
    base_url="https://api.clarifai.com/v2/ext/openai/v1",
    api_key="YOUR_PAT"
)
response = client.chat.completions.create(
    model="CLARIFAI_MODEL_URL",
    messages=[{"role": "user", "content": "What is the capital of France?"}]
)
print(response.choices[0].message.content)

# Using LiteLLM
import litellm

response = litellm.completion(
    model="openai/CLARIFAI_MODEL_URL",
    api_key="YOUR_PAT",
    api_base="https://api.clarifai.com/v2/ext/openai/v1",
    messages=[{"role": "user", "content": "What is the capital of France?"}]
)
print(response.choices[0].message.content)

Streaming Inference

# Using Clarifai SDK
response_stream = model.generate("Tell me a story")
for chunk in response_stream:
    print(chunk, end="", flush=True)

# Using OpenAI Client
stream = client.chat.completions.create(
    model="CLARIFAI_MODEL_URL",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)
for chunk in stream:
    print(chunk.choices[0].delta.content, end="", flush=True)

# Using LiteLLM
for chunk in litellm.completion(
    model="openai/CLARIFAI_MODEL_URL",
    api_key="YOUR_PAT",
    api_base="https://api.clarifai.com/v2/ext/openai/v1",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
):
    print(chunk.choices[0].delta.content, end="", flush=True)

Tool Calling

# Example tool definition
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current temperature for a given location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City and country e.g. Tokyo, Japan"
                }
            },
            "required": ["location"]
        }
    }
}]

# Using Clarifai SDK
response = model.predict(
    prompt="What's the weather in Tokyo?",
    tools=tools,
    tool_choice='auto'
)

# Using OpenAI Client
response = client.chat.completions.create(
    model="CLARIFAI_MODEL_URL",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools
)

# Using LiteLLM
response = litellm.completion(
    model="openai/CLARIFAI_MODEL_URL",
    api_key="YOUR_PAT",
    api_base="https://api.clarifai.com/v2/ext/openai/v1",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools
)

Async inference

# Example block to call async_predict from notebook cells
from clarifai.client.model import Model

async def main():
    model = Model(url="https://clarifai.com/qwen/qwenLM/models/QwQ-32B-AWQ")
    response = await model.async_predict(prompt= "what is the value of pi?",
                                         max_tokens=100)
    return response

await main()
# Example block to call async_generate from notebook cells
from clarifai.client.model import Model

async def main():
    model = Model(url="https://clarifai.com/qwen/qwenLM/models/QwQ-32B-AWQ")
    response = await model.async_generate(prompt= "what is the value of pi?",
                                         max_tokens=100)
    return response

# iterate the response over async generator
response = await main()
for res in response:
    print(res)

Notes

  • Always ensure your Clarifai PAT is set in the environment variables
  • For multimodal models, provide both text and image inputs as required
  • Tool calling support may vary depending on the model capabilities
  • Streaming responses are token-by-token and may have different formatting across SDKs
  • Error handling and retry logic should be implemented in production environments

Additional Resources