Skip to content

elizabetht/spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DGX Spark Network Benchmarks and LLM Serving

Hands-on tutorials for RDMA networking, disaggregated LLM serving, and Kubernetes cluster setup on NVIDIA DGX Spark systems. All notebooks run real code on real hardware: no simulated benchmarks, no hardcoded numbers.

Hardware

Node GPU Network
spark-01 1x NVIDIA GB10 RoCE (RDMA over Converged Ethernet)
spark-02 1x NVIDIA GB10 RoCE (RDMA over Converged Ethernet)

Direct-connected via dual 100 Gbps links.

Tutorials

Measure RDMA performance between two direct-connected DGX Spark nodes. Covers single-link benchmarks, multi-rail bonding, and NIXL GPU-to-GPU transfers.

Notebook Description
01_InfiniBand_Tutorial RDMA basics, ib_write_bw vs iperf3, single-link bandwidth and latency measurements
02_Multi_Rail_Tutorial Dual-link performance: Linux bonding vs NIXL multi-rail RDMA comparison
03_NixlBench Systematic nixlbench benchmarking for GPU-to-GPU RDMA transfer throughput

Split prefill and decode across two DGX Spark nodes using vLLM and NIXL. Builds understanding progressively from single-node baseline through replicated serving to disaggregated inference.

Notebook Description
00_Environment_Setup Verify GPU, network, and software configuration on both nodes
01_Local_Inference_Baseline Single-node vLLM performance with continuous batching (the bar to beat)
02_Understanding_KV_Cache KV cache dimensions from model architecture constants, TCP vs RDMA transfer cost
03_Replicated_Serving Two independent vLLM instances behind a round-robin proxy (fair comparison baseline)
04_Disaggregated_Serving Prefill on spark-01, decode on spark-02 via vLLM's NixlConnector for GPU-to-GPU KV cache transfer
05_Production_Benchmarking guidellm sweeps across all three configurations with TTFT/TPOT/ITL breakdowns and P50/P95/P99 distributions

Deploy a 3-node Kubernetes cluster (1 CPU controller + 2 DGX Spark GPU workers) using MicroK8s, with GPU Operator configuration for containerized inference.

Notebook Description
01_MicroK8s_Cluster_Setup Cluster formation, GPU Operator install, vLLM deployment on Kubernetes

Prerequisites

  • Two DGX Spark nodes with SSH access between them
  • RDMA-capable network (RoCE or InfiniBand)
  • Python virtual environment with vLLM 0.13.0+, NIXL, PyTorch 2.9.0+
  • meta-llama/Llama-3.1-8B-Instruct cached on both nodes

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors