DGX Spark Network Benchmarks and LLM Serving

Hands-on tutorials for RDMA networking, disaggregated LLM serving, and Kubernetes cluster setup on NVIDIA DGX Spark systems. All notebooks run real code on real hardware: no simulated benchmarks, no hardcoded numbers.

Hardware

Node	GPU	Network
spark-01	1x NVIDIA GB10	RoCE (RDMA over Converged Ethernet)
spark-02	1x NVIDIA GB10	RoCE (RDMA over Converged Ethernet)

Direct-connected via dual 100 Gbps links.

Tutorials

01: RDMA Networking Deep Dive

Measure RDMA performance between two direct-connected DGX Spark nodes. Covers single-link benchmarks, multi-rail bonding, and NIXL GPU-to-GPU transfers.

Notebook	Description
01_InfiniBand_Tutorial	RDMA basics, `ib_write_bw` vs `iperf3`, single-link bandwidth and latency measurements
02_Multi_Rail_Tutorial	Dual-link performance: Linux bonding vs NIXL multi-rail RDMA comparison
03_NixlBench	Systematic `nixlbench` benchmarking for GPU-to-GPU RDMA transfer throughput

02: Disaggregated LLM Serving

Split prefill and decode across two DGX Spark nodes using vLLM and NIXL. Builds understanding progressively from single-node baseline through replicated serving to disaggregated inference.

Notebook	Description
00_Environment_Setup	Verify GPU, network, and software configuration on both nodes
01_Local_Inference_Baseline	Single-node vLLM performance with continuous batching (the bar to beat)
02_Understanding_KV_Cache	KV cache dimensions from model architecture constants, TCP vs RDMA transfer cost
03_Replicated_Serving	Two independent vLLM instances behind a round-robin proxy (fair comparison baseline)
04_Disaggregated_Serving	Prefill on spark-01, decode on spark-02 via vLLM's `NixlConnector` for GPU-to-GPU KV cache transfer
05_Production_Benchmarking	`guidellm` sweeps across all three configurations with TTFT/TPOT/ITL breakdowns and P50/P95/P99 distributions

03: MicroK8s Cluster Setup

Deploy a 3-node Kubernetes cluster (1 CPU controller + 2 DGX Spark GPU workers) using MicroK8s, with GPU Operator configuration for containerized inference.

Notebook	Description
01_MicroK8s_Cluster_Setup	Cluster formation, GPU Operator install, vLLM deployment on Kubernetes

Prerequisites

Two DGX Spark nodes with SSH access between them
RDMA-capable network (RoCE or InfiniBand)
Python virtual environment with vLLM 0.13.0+, NIXL, PyTorch 2.9.0+
meta-llama/Llama-3.1-8B-Instruct cached on both nodes

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
notebooks		notebooks
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
agents.md		agents.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DGX Spark Network Benchmarks and LLM Serving

Hardware

Tutorials

01: RDMA Networking Deep Dive

02: Disaggregated LLM Serving

03: MicroK8s Cluster Setup

Prerequisites

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DGX Spark Network Benchmarks and LLM Serving

Hardware

Tutorials

01: RDMA Networking Deep Dive

02: Disaggregated LLM Serving

03: MicroK8s Cluster Setup

Prerequisites

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages