Grove

One API. Any inference architecture.

Grove is a Kubernetes API that provides a single declarative interface for orchestrating any AI inference workload — from simple, single-pod deployments to complex multi-node, disaggregated systems. Grove lets you scale your multinode inference deployment from a single replica to data center scale, supporting tens of thousands of GPUs. It allows you to describe your whole inference serving system in Kubernetes - e.g. prefill, decode, routing or any other component - as a single Custom Resource (CR). From that one spec, the platform coordinates hierarchical gang scheduling, topology‑aware placement, multi-level autoscaling and explicit startup ordering. You get precise control of how the system behaves without stitching together scripts, YAML files, or custom controllers.

Quick Start on Local Kind Cluster

Get Grove running in 5 minutes on a local kind cluster.

# 1. Create a local kind cluster
cd operator && make kind-up

# 2. Deploy Grove
make deploy

# 3. Deploy your first workload
kubectl apply -f samples/simple/simple1.yaml

# 4. Fetch the resources created by grove
kubectl get pcs,pclq,pcsg,pg,pod -o wide

Follow along with this example in the → Quickstart Doc

For more install options including local and remote K8s clusters, see the → Installation Docs

Motivation

Modern AI inference workloads need capabilities that Kubernetes natively doesn't provide out-of-the-box:

Scaling for Multi-Node/Multi-Pod Units - Large models may be sharded across multiple nodes, meaning a single model instance spans multiple pods. In this case, the fundamental scaling unit is no longer an individual pod, but an entire group of pods that together form one model instance.
Hierarchical Gang scheduling - Multi-node model instances require pods to be scheduled together; if only a subset of the required pods are scheduled, the model is unusable, resources remain idle, and the system can deadlock waiting for the remaining pods. Disaggregated inference has similar constraints: at least one prefill instance and one decode instance must be scheduled to form a functional pipeline. Therefore, gang scheduling must occur at multiple levels, ensuring required components start together as an all-or-nothing unit.
Startup ordering - Even when components must be scheduled together (e.g., leader and worker pods in a multi-node model instance), there are cases where they must start in a specific order. For example, MPI workloads require all worker pods to be ready before the leader pod launches the application. Explicit startup ordering ensures correct initialization and avoids failures caused by components starting out-of-order.
Topology-aware placement - Components in an inference system often communicate heavily between each other. Network optimized placement, e.g. within NVLink domains, is crucial to minimize communication overheads and maximize performance.

How It Works

Grove introduces four simple concepts:

Concept	Description
PodClique	A group of pods representing a specific role (e.g., leader, worker, frontend). Each clique has an independent configuration and supports custom scaling logic.
PodCliqueScalingGroup	A set of PodCliques that scale and are scheduled together as a gang. Ideal for tightly coupled roles like prefill leader and worker.
PodCliqueSet	The top-level Grove object that defines a group of components managed and colocated together. Also supports autoscaling with topology aware spread of PodCliqueSet replicas for availability.
PodGang	The scheduler API that defines a unit of gang-scheduling. A PodGang is a collection of groups of similar pods, where each pod group defines a minimum number of replicas guaranteed for gang-scheduling.

Get started with a step-by-step hands-on Grove tutorial here → Core Concepts Overview

Refer to all Grove APIs here → API Reference

Example Use Cases

Multi-Node, Disaggregated Inference for large models (DeepSeek-R1, Llama-4-Maverick) : Visualization
Single-Node, Disaggregated Inference : Visualization
Agentic Pipeline of Models : Visualization
Standard Aggregated Single Node or Single GPU Inference : Visualization

Roadmap

2025 Priorities

Hierarchical Gang Scheduling ✅
Multi-Level Horizontal Auto-Scaling ✅
Startup Ordering ✅
Rolling Updates ✅
Topology-Aware Scheduling

2026 Priorities

Resource-Optimized Rolling Updates
Topology Spread Constraints
Automatic Topology Detection
And More!

Contributions

Please read the contribution guide before creating you first PR!

Community, Discussion, and Support

Grove is an open-source project and we welcome community engagement!

Please feel free to start a discussion thread if you want to discuss a topic of interest.

In case, you have run into any issue or would like a feature enhancement, please create a GitHub Issue with the appropriate tag.

To directly reach out to the Grove user and developer community, please join the NVIDIA Dynamo Discord server, or Grove mailing list.

Name		Name	Last commit message	Last commit date
Latest commit History 205 Commits
.github		.github
cli-plugin		cli-plugin
docs		docs
hack		hack
operator		operator
scheduler		scheduler
.gitignore		.gitignore
.golangci.yaml		.golangci.yaml
ATTRIBUTION.md		ATTRIBUTION.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MAINTAINERS.md		MAINTAINERS.md
Makefile		Makefile
OWNERS		OWNERS
README.md		README.md
SECURITY.md		SECURITY.md
code-of-conduct.md		code-of-conduct.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Grove

Quick Start on Local Kind Cluster

Motivation

How It Works

Example Use Cases

Roadmap

2025 Priorities

2026 Priorities

Contributions

Community, Discussion, and Support

About

Uh oh!

Releases 3

Packages

Uh oh!

Uh oh!

Contributors 11

Languages

License

ai-dynamo/grove

Folders and files

Latest commit

History

Repository files navigation

Grove

Quick Start on Local Kind Cluster

Motivation

How It Works

Example Use Cases

Roadmap

2025 Priorities

2026 Priorities

Contributions

Community, Discussion, and Support

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Uh oh!

Contributors 11

Languages

Packages