thestage.ai labs

TheStage AI Platform

Inference optimization for LLMs, diffusion, and voice. Self-hosted or cloud. Works on NVIDIA GPUs, Apple Silicon, and edge devices.

Links:

Web App • Docs • Hugging Face • X • LinkedIn • Discord (request invite) • Email

What is TheStage AI

TheStage AI is an inference optimization stack. It helps you compress, compile, and serve models. You keep control of the accuracy versus performance trade-off.

Products / Components

ANNA (Automatic Neural Network Acceleration)

Automated compression analysis under user-defined constraints (size, MACs, latency, memory). Outputs a QlipConfig for compile and serve.
Qlip

Full-stack optimization and inference framework. Quantization, sparsification, and compilation for NVIDIA GPUs (Apple Silicon supported). Produces pre-compiled (non-JIT) artifacts with dynamic shapes and mixed precision. Triton-based serving.
Elastic Models

Qlip-optimized models with S / M / L / XL performance tiers (availability varies). L/M/S may include quantization or pruning for faster inference.
TheStage CLI

Manage projects, tokens, and hardware from the terminal. Launch/monitor jobs, rent instances, and stream logs.
TheStage Platform

Web UI and APIs for instances, models, and deployments. Includes the Playground to test Elastic Models, switch hardware, and compare tiers before deployment.

Key features

Elastic Models with S/M/L/XL tiers per model (choose cost, quality, and memory balance; availability varies).
ANNA constraint-driven compression analysis (outputs a QlipConfig for compile and serve).
Qlip compiler and runtime (pre-compiled engines; no runtime JIT; dynamic shapes; mixed precision).
OpenAI-compatible HTTP serving (deploy and scale models through a standard API).
Playground to test models and hardware (compare performance and tiers before deployment).
Self-host or run in the cloud (use your own infrastructure; keep data private).
Hardware support: NVIDIA (incl. Jetson), Apple Silicon, and edge targets (NPUs, DSPs, and MCUs per model).
Comprehensive tutorials and documentation (from setup to evaluation and production).

Quickstart

Install CLI: pip install thestage
Set token: thestage config set --api-token <YOUR_API_TOKEN> (get it in the web app)
Use elastic_models in your code and choose a tier (S/M/L/XL). See Markdown version for a snippet.
Diffusion and voice examples are in the docs.

Serving

OpenAI-compatible API flow with Modal is documented (single- and multi-GPU).

Start here: https://docs.thestage.ai/

Supported hardware

NVIDIA GPUs (incl. Jetson where applicable)
Apple Silicon
Edge/embedded devices

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

thestage.ai labs

TheStage AI Platform

What is TheStage AI

Products / Components

Key features

Quickstart

Serving

Supported hardware

Popular repositories Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!