TensorZero

TensorZero Logo

TensorZero

#1 Repository Of The Day

TensorZero is an open-source stack for industrial-grade LLM applications:

Gateway: access every LLM provider through a unified API, built for performance (<1ms p99 latency)
Observability: store inferences and feedback in your database, available programmatically or in the UI
Optimization: collect metrics and human feedback to optimize prompts, models, and inference strategies
Evaluation: benchmark individual inferences or end-to-end workflows using heuristics, LLM judges, etc.
Experimentation: ship with confidence with built-in A/B testing, routing, fallbacks, retries, etc.

Take what you need, adopt incrementally, and complement with other tools.

tensorzero-demo.mp4

Website · Docs · Twitter · Slack · Discord

Quick Start (5min) · Deployment Guide · API Reference · Configuration Reference

Note

Coming Soon: TensorZero Autopilot

TensorZero Autopilot is an automated AI engineer (powered by the TensorZero Stack) that analyzes LLM observability data, optimizes prompts and models, sets up evals, and runs A/B tests. Learn more Join the waitlist

Features

🌐 LLM Gateway

Integrate with TensorZero once and access every major LLM provider.

Call any LLM (API or self-hosted) through a single unified API
Infer with streaming, tool use, structured outputs (JSON), batch, embeddings, multimodal (images, files), caching, etc.
Create prompt templates and schemas to enforce a structured interface between your application and the LLMs
Satisfy extreme throughput and latency needs, thanks to 🦀 Rust: <1ms p99 latency overhead at 10k+ QPS
Use any programming language: integrate via our Python SDK, any OpenAI SDK, or our HTTP API
Ensure high availability with routing, retries, fallbacks, load balancing, granular timeouts, etc.
Track usage and cost and enforce custom rate limits with granular scopes (e.g. tags)
Set up auth for TensorZero to allow clients to access models without sharing provider API keys

Supported Model Providers

Anthropic, AWS Bedrock, AWS SageMaker, Azure, DeepSeek, Fireworks, GCP Vertex AI Anthropic, GCP Vertex AI Gemini, Google AI Studio (Gemini API), Groq, Hyperbolic, Mistral, OpenAI, OpenRouter, SGLang, TGI, Together AI, vLLM, and xAI (Grok). Need something else? TensorZero also supports any OpenAI-compatible API (e.g. Ollama).

Usage Example

You can use TensorZero with any OpenAI SDK (Python, Node, Go, etc.) or OpenAI-compatible client.

Deploy the TensorZero Gateway (one Docker container).
Update the base_url and model in your OpenAI-compatible client.
Run inference:

from openai import OpenAI

# Point the client to the TensorZero Gateway
client = OpenAI(base_url="http://localhost:3000/openai/v1", api_key="not-used")

response = client.chat.completions.create(
    # Call any model provider (or TensorZero function)
    model="tensorzero::model_name::anthropic::claude-sonnet-4-6",
    messages=[
        {
            "role": "user",
            "content": "Write a haiku about TensorZero.",
        }
    ],
)

See Quick Start for more information.

🔍 LLM Observability

Zoom in to debug individual API calls, or zoom out to monitor metrics across models and prompts over time — all using the open-source TensorZero UI.

Store inferences and feedback (metrics, human edits, etc.) in your own database
Dive into individual inferences or high-level aggregate patterns using the TensorZero UI or programmatically
Build datasets for optimization, evaluation, and other workflows
Replay historical inferences with new prompts, models, inference strategies, etc.
Export OpenTelemetry traces (OTLP) and export Prometheus metrics to your favorite application observability tools
Soon: AI-assisted debugging and root cause analysis; AI-assisted data labeling

📈 LLM Optimization

Send production metrics and human feedback to easily optimize your prompts, models, and inference strategies — using the UI or programmatically.

Optimize your models with supervised fine-tuning, RLHF, and other techniques
Optimize your prompts with automated prompt engineering algorithms like GEPA and MIPROv2
Optimize your inference strategy with dynamic in-context learning, best/mixture-of-N sampling, etc.
Enable a feedback loop for your LLMs: a data & learning flywheel turning production data into smarter, faster, and cheaper models
Soon: synthetic data generation

📊 LLM Evaluation

Compare prompts, models, and inference strategies using evaluations powered by heuristics and LLM judges.

Evaluate individual inferences with inference evaluations powered by heuristics or LLM judges (≈ unit tests for LLMs)
Evaluate end-to-end workflows with workflow evaluations with complete flexibility (≈ integration tests for LLMs)
Optimize LLM judges just like any other TensorZero function to align them to human preferences
Soon: more built-in evaluators; headless evaluations

Evaluation » UI Evaluation » CLI

docker compose run --rm evaluations \
  --evaluation-name extract_data \
  --dataset-name hard_test_cases \
  --variant-name gpt_4o \
  --concurrency 5

Run ID: 01961de9-c8a4-7c60-ab8d-15491a9708e4
Number of datapoints: 100
██████████████████████████████████████ 100/100
exact_match: 0.83 ± 0.03 (n=100)
semantic_match: 0.98 ± 0.01 (n=100)
item_count: 7.15 ± 0.39 (n=100)

🧪 LLM Experimentation

Ship with confidence with built-in A/B testing, routing, fallbacks, retries, etc.

Run adaptive A/B tests to ship with confidence and identify the best prompts and models for your use cases.
Enforce principled experiments in complex workflows, including support for multi-turn LLM systems, sequential testing, and more.

& more!

Build with an open-source stack well-suited for prototypes but designed from the ground up to support the most complex LLM applications and deployments.

Build simple applications or massive deployments with GitOps-friendly orchestration
Extend TensorZero with built-in escape hatches, programmatic-first usage, direct database access, and more
Integrate with third-party tools: specialized observability and evaluations, model providers, agent orchestration frameworks, etc.
Iterate quickly by experimenting with prompts interactively using the Playground UI

Frequently Asked Questions

How is TensorZero different from other LLM frameworks?

TensorZero enables you to optimize complex LLM applications based on production metrics and human feedback.
TensorZero supports the needs of industrial-grade LLM applications: low latency, high throughput, type safety, self-hosted, GitOps, customizability, etc.
TensorZero unifies the entire LLMOps stack, creating compounding benefits. For example, LLM evaluations can be used for fine-tuning models alongside AI judges.

Can I use TensorZero with ___?

Yes. Every major programming language is supported. It plays nicely with the OpenAI SDK, OpenTelemetry, and every major LLM.

Is TensorZero production-ready?

Yes. TensorZero is used by companies ranging from frontier AI startups to the Fortune 50.

Here's a case study: Automating Code Changelogs at a Large Bank with LLMs

How much does TensorZero cost?

TensorZero Stack (LLMOps platform) is 100% self-hosted and open-source.

TensorZero Autopilot (automated AI engineer) is a complementary paid product powered by the TensorZero Stack.

Who is building TensorZero?

Our technical team includes a former Rust compiler maintainer, machine learning researchers (Stanford, CMU, Oxford, Columbia) with thousands of citations, and the chief product officer of a decacorn startup. We're backed by the same investors as leading open-source projects (e.g. ClickHouse, CockroachDB) and AI labs (e.g. OpenAI, Anthropic). See our $7.3M seed round announcement and coverage from VentureBeat. We're hiring in NYC.

How do I get started?

You can adopt TensorZero incrementally. Our Quick Start goes from a vanilla OpenAI wrapper to a production-ready LLM application with observability and fine-tuning in just 5 minutes.

Demo

Watch LLMs get better at data extraction in real-time with TensorZero!

Dynamic in-context learning (DICL) is a powerful inference-time optimization available out of the box with TensorZero. It enhances LLM performance by automatically incorporating relevant historical examples into the prompt, without the need for model fine-tuning.

LLMs-get-better-at-data-extraction-in-real-time-with-TensorZero.mp4

Get Started

Start building today. The Quick Start shows it's easy to set up an LLM application with TensorZero.

Questions? Ask us on Slack or Discord.

Using TensorZero at work? Email us at hello@tensorzero.com to set up a Slack or Teams channel with your team (free).

Examples

We are working on a series of complete runnable examples illustrating TensorZero's data & learning flywheel.

Optimizing Data Extraction (NER) with TensorZero

This example shows how to use TensorZero to optimize a data extraction pipeline. We demonstrate techniques like fine-tuning and dynamic in-context learning (DICL). In the end, an optimized GPT-4o Mini model outperforms GPT-4o on this task — at a fraction of the cost and latency — using a small amount of training data.

Agentic RAG — Multi-Hop Question Answering with LLMs

This example shows how to build a multi-hop retrieval agent using TensorZero. The agent iteratively searches Wikipedia to gather information, and decides when it has enough context to answer a complex question.

Writing Haikus to Satisfy a Judge with Hidden Preferences

This example fine-tunes GPT-4o Mini to generate haikus tailored to a specific taste. You'll see TensorZero's "data flywheel in a box" in action: better variants leads to better data, and better data leads to better variants. You'll see progress by fine-tuning the LLM multiple times.

Image Data Extraction — Multimodal (Vision) Fine-tuning

This example shows how to fine-tune multimodal models (VLMs) like GPT-4o to improve their performance on vision-language tasks. Specifically, we'll build a system that categorizes document images (screenshots of computer science research papers).

Improving LLM Chess Ability with Best-of-N Sampling

This example showcases how best-of-N sampling can significantly enhance an LLM's chess-playing abilities by selecting the most promising moves from multiple generated options.

Blog Posts

We write about LLM engineering on the TensorZero Blog. Here are some of our favorite posts:

Name		Name	Last commit message	Last commit date
Latest commit History 3,653 Commits
.buildkite		.buildkite
.cargo		.cargo
.claude/commands		.claude/commands
.config		.config
.github		.github
.sqlx		.sqlx
ci		ci
clients		clients
docs		docs
evaluations		evaluations
examples		examples
gateway		gateway
googletest-matchers		googletest-matchers
internal		internal
provider-proxy		provider-proxy
recipes		recipes
tensorzero-core		tensorzero-core
tensorzero-optimizers		tensorzero-optimizers
ui		ui
.dockerignore		.dockerignore
.gitignore		.gitignore
.oxfmtrc.json		.oxfmtrc.json
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
AGENTS.md		AGENTS.md
CITATION.cff		CITATION.cff
CLA.md		CLA.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
RELEASE_GUIDE.md		RELEASE_GUIDE.md
REVIEWING.md		REVIEWING.md
SECURITY.md		SECURITY.md
artifacthub-repo.yml		artifacthub-repo.yml
clippy.toml		clippy.toml
deny.toml		deny.toml
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
ruff.toml		ruff.toml
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TensorZero

Coming Soon: TensorZero Autopilot

Features

🌐 LLM Gateway

Supported Model Providers

Usage Example

🔍 LLM Observability

📈 LLM Optimization

📊 LLM Evaluation

🧪 LLM Experimentation

& more!

Frequently Asked Questions

Demo

Get Started

Examples

Blog Posts

About

Uh oh!

Releases 112

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TensorZero

Coming Soon: TensorZero Autopilot

Features

🌐 LLM Gateway

Supported Model Providers

Usage Example

🔍 LLM Observability

📈 LLM Optimization

📊 LLM Evaluation

🧪 LLM Experimentation

& more!

Frequently Asked Questions

Demo

Get Started

Examples

Blog Posts

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 112

Uh oh!

Contributors

Uh oh!

Languages