Skip to content

ai-dynamo/grove

Grove

Go Report Card License GitHub Release Discord

One API. Any inference architecture.

Grove is a Kubernetes API that provides a single declarative interface for orchestrating any AI inference workload — from simple, single-pod deployments to complex multi-node, disaggregated systems. Grove lets you scale your multinode inference deployment from a single replica to data center scale, supporting tens of thousands of GPUs. It allows you to describe your whole inference serving system in Kubernetes - e.g. prefill, decode, routing or any other component - as a single Custom Resource (CR). From that one spec, the platform coordinates hierarchical gang scheduling, topology‑aware placement, multi-level autoscaling and explicit startup ordering. You get precise control of how the system behaves without stitching together scripts, YAML files, or custom controllers.

Quick Start on Local Kind Cluster

Get Grove running in 5 minutes on a local kind cluster.

# 1. Create a local kind cluster
cd operator && make kind-up

# 2. Deploy Grove
make deploy

# 3. Deploy your first workload
kubectl apply -f samples/simple/simple1.yaml

# 4. Fetch the resources created by grove
kubectl get pcs,pclq,pcsg,pg,pod -o wide

Follow along with this example in the Quickstart Doc

For more install options including local and remote K8s clusters, see the Installation Docs

Motivation

Modern AI inference workloads need capabilities that Kubernetes natively doesn't provide out-of-the-box:

  • Scaling for Multi-Node/Multi-Pod Units - Large models may be sharded across multiple nodes, meaning a single model instance spans multiple pods. In this case, the fundamental scaling unit is no longer an individual pod, but an entire group of pods that together form one model instance.
  • Hierarchical Gang scheduling - Multi-node model instances require pods to be scheduled together; if only a subset of the required pods are scheduled, the model is unusable, resources remain idle, and the system can deadlock waiting for the remaining pods. Disaggregated inference has similar constraints: at least one prefill instance and one decode instance must be scheduled to form a functional pipeline. Therefore, gang scheduling must occur at multiple levels, ensuring required components start together as an all-or-nothing unit.
  • Startup ordering - Even when components must be scheduled together (e.g., leader and worker pods in a multi-node model instance), there are cases where they must start in a specific order. For example, MPI workloads require all worker pods to be ready before the leader pod launches the application. Explicit startup ordering ensures correct initialization and avoids failures caused by components starting out-of-order.
  • Topology-aware placement - Components in an inference system often communicate heavily between each other. Network optimized placement, e.g. within NVLink domains, is crucial to minimize communication overheads and maximize performance.

How It Works

Grove introduces four simple concepts:

Concept Description
PodClique A group of pods representing a specific role (e.g., leader, worker, frontend). Each clique has an independent configuration and supports custom scaling logic.
PodCliqueScalingGroup A set of PodCliques that scale and are scheduled together as a gang. Ideal for tightly coupled roles like prefill leader and worker.
PodCliqueSet The top-level Grove object that defines a group of components managed and colocated together. Also supports autoscaling with topology aware spread of PodCliqueSet replicas for availability.
PodGang The scheduler API that defines a unit of gang-scheduling. A PodGang is a collection of groups of similar pods, where each pod group defines a minimum number of replicas guaranteed for gang-scheduling.

Get started with a step-by-step hands-on Grove tutorial here Core Concepts Overview

Refer to all Grove APIs here API Reference

Example Use Cases

  • Multi-Node, Disaggregated Inference for large models (DeepSeek-R1, Llama-4-Maverick) : Visualization
  • Single-Node, Disaggregated Inference : Visualization
  • Agentic Pipeline of Models : Visualization
  • Standard Aggregated Single Node or Single GPU Inference : Visualization

Roadmap

2025 Priorities

  • Hierarchical Gang Scheduling ✅
  • Multi-Level Horizontal Auto-Scaling ✅
  • Startup Ordering ✅
  • Rolling Updates ✅
  • Topology-Aware Scheduling

2026 Priorities

  • Resource-Optimized Rolling Updates
  • Topology Spread Constraints
  • Automatic Topology Detection
  • And More!

Contributions

Please read the contribution guide before creating you first PR!

Community, Discussion, and Support

Grove is an open-source project and we welcome community engagement!

Please feel free to start a discussion thread if you want to discuss a topic of interest.

In case, you have run into any issue or would like a feature enhancement, please create a GitHub Issue with the appropriate tag.

To directly reach out to the Grove user and developer community, please join the NVIDIA Dynamo Discord server, or Grove mailing list.