Reproducible benchmark recipes for GPUs

Welcome to the reproducible benchmark recipes repository for GPUs! This repository contains recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.

Overview

Identify your requirements: Determine the model, GPU type, workload, framework, and orchestrator you are interested in.
Select a recipe: Based on your requirements use the Benchmark support matrix to find a recipe that meets your needs.
Follow the recipe: each recipe will provide you with procedures to complete the following tasks:
- Prepare your environment
- Run the benchmark
- Analyze the benchmarks results. This includes not just the results but detailed logs for further analysis

Benchmarks support matrix

Training benchmarks A3 Mega

Models	GPU Machine Type	Framework	Workload Type	Orchestrator	Link to the recipe
GPT3-175B	A3 Mega (NVIDIA H100)	NeMo	Pre-training	GKE	Link
Llama-3-70B	A3 Mega (NVIDIA H100)	NeMo	Pre-training	GKE	Link
Llama-3.1-70B	A3 Mega (NVIDIA H100)	NeMo	Pre-training	GKE	Link
Mixtral-8-7B	A3 Mega (NVIDIA H100)	NeMo	Pre-training	GKE	Link

Training benchmarks A3 Ultra

Models	GPU Machine Type	Framework	Workload Type	Orchestrator	Link to the recipe
Llama-3.1-70B	A3 Ultra (NVIDIA H200)	MaxText	Pre-training	GKE	Link
Llama-3.1-70B	A3 Ultra (NVIDIA H200)	NeMo	Pre-training	GKE	Link
Llama-3.1-405B	A3 Ultra (NVIDIA H200)	MaxText	Pre-training	GKE	Link
Llama-3.1-405B	A3 Ultra (NVIDIA H200)	NeMo.	Pre-training	GKE	Link
Mixtral-8-7B	A3 Ultra (NVIDIA H200)	NeMo	Pre-training	GKE	Link

Training benchmarks A4

Models	GPU Machine Type	Framework / Library	Workload Type	Orchestrator	Link to the recipe
Llama-3.1-70B	A4 (NVIDIA B200)	MaxText	Pre-training	GKE	Link
Llama-3.1-70B	A4 (NVIDIA B200)	NeMo	Pre-training	GKE	Link
Llama-3.1-405B	A4 (NVIDIA B200)	MaxText	Pre-training	GKE	Link
Llama-3.1-405B	A4 (NVIDIA B200)	NeMo	Pre-training	GKE	Link
Mixtral-8-7B	A4 (NVIDIA B200)	NeMo	Pre-training	GKE	Link
PaliGemma2	A4 (NVIDIA B200)	Hugging Face Accelerate	Finetuning	GKE	Link

Training benchmarks A4X

Models	GPU Machine Type	Framework	Workload Type	Orchestrator	Link to the recipe
Llama-3.1-8B	A4X (NVIDIA GB200)	NeMo	Pre-training	GKE	Link
Llama-3.1-70B	A4X (NVIDIA GB200)	NeMo	Pre-training	GKE	Link
Llama-3.1-405B	A4X (NVIDIA GB200)	NeMo	Pre-training	GKE	Link

Inference benchmarks A3 Mega

Models	GPU Machine Type	Framework	Workload Type	Orchestrator	Link to the recipe
Llama-4	A3 Mega (NVIDIA H100)	SGLang	Inference	GKE	Link
DeepSeek R1 671B	A3 Mega (NVIDIA H100)	SGLang	Inference	GKE	Link
DeepSeek R1 671B	A3 Mega (NVIDIA H100)	vLLM	Inference	GKE	Link

Inference benchmarks A3 Ultra

Models	GPU Machine Type	Framework	Workload Type	Orchestrator	Link to the recipe
GPT OSS 120B	A3 Ultra (NVIDIA H200)	vLLM	Inference	GKE	Link
Llama-4	A3 Ultra (NVIDIA H200)	vLLM	Inference	GKE	Link
Llama-3.1-405B	A3 Ultra (NVIDIA H200)	TensorRT-LLM	Inference	GKE	Link
DeepSeek R1 671B	A3 Ultra (NVIDIA H200)	SGLang	Inference	GKE	Link
DeepSeek R1 671B	A3 Ultra (NVIDIA H200)	vLLM	Inference	GKE	Link

Inference benchmarks A4

Models	GPU Machine Type	Framework	Workload Type	Orchestrator	Link to the recipe
DeepSeek R1 671B	A4 (NVIDIA B200)	vLLM	Inference	GKE	Link
DeepSeek R1 671B	A4 (NVIDIA B200)	SGLang	Inference	GKE	Link

Inference benchmarks G4

Models	GPU Machine Type	Framework	Workload Type	Orchestrator	Link to the recipe
Qwen3 8B	G4 (NVIDIA RTX PRO 6000 Blackwell)	vLLM	Inference	GCE	Link
Qwen3 30B A3B	G4 (NVIDIA RTX PRO 6000 Blackwell)	TensorRT-LLM	Inference	GCE	Link
Qwen3 4B	G4 (NVIDIA RTX PRO 6000 Blackwell)	TensorRT-LLM	Inference	GCE	Link
Qwen3 8B	G4 (NVIDIA RTX PRO 6000 Blackwell)	TensorRT-LLM	Inference	GCE	Link
Qwen3 32B	G4 (NVIDIA RTX PRO 6000 Blackwell)	TensorRT-LLM	Inference	GCE	Link
Qwen3 32B	G4 (NVIDIA RTX PRO 6000 Blackwell)	vLLM	Inference	GCE	Link
Llama3.1 70B	G4 (NVIDIA RTX PRO 6000 Blackwell)	TensorRT-LLM	Inference	GCE	Link
DeepSeek R1	G4 (NVIDIA RTX PRO 6000 Blackwell)	TensorRT-LLM	Inference	GCE	Link

Checkpointing benchmarks

Models	GPU Machine Type	Framework	Workload Type	Orchestrator	Link to the recipe
Llama-3.1-70B	A3 Mega (NVIDIA H100)	NeMo	Pre-training using Google Cloud Storage buckets for checkpoints	GKE	Link

Goodput benchmarks

Models	GPU Machine Type	Framework	Workload Type	Orchestrator	Link to the recipe
Llama-3.1-70B	A3 Mega (NVIDIA H100)	NeMo	Pre-training using the Google Cloud Resiliency library	GKE	Link
Llama-3.1-405B	A3 Ultra (NVIDIA H200)	NeMo	Pre-training using the Google Cloud Resiliency library	GKE	Link
Mixtral-8x7B	A3 Ultra (NVIDIA H200)	NeMo	Pre-training using the Google Cloud Resiliency library	GKE	Link

Repository structure

training/: Contains recipes to reproduce training benchmarks with GPUs.
inference/: Contains recipes to reproduce inference benchmarks with GPUs.
src/: Contains shared dependencies required to run benchmarks, such as Docker and Helm charts.
docs/: Contains supporting documentation for the recipes, such as explanation of benchmark methodologies or configurations.

Getting help

If you have any questions or if you found any problems with this repository, please report through GitHub issues.

Disclaimer

This is not an officially supported Google product. The code in this repository is for demonstrative purposes only.

Name		Name	Last commit message	Last commit date
Latest commit History 160 Commits
RL/a4/recipes		RL/a4/recipes
docs		docs
inference		inference
src		src
training		training
CONTRIBUTING		CONTRIBUTING
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reproducible benchmark recipes for GPUs

Overview

Benchmarks support matrix

Training benchmarks A3 Mega

Training benchmarks A3 Ultra

Training benchmarks A4

Training benchmarks A4X

Inference benchmarks A3 Mega

Inference benchmarks A3 Ultra

Inference benchmarks A4

Inference benchmarks G4

Checkpointing benchmarks

Goodput benchmarks

Repository structure

Getting help

Disclaimer

About

Uh oh!

Uh oh!

Contributors 18

Uh oh!

Languages

License

AI-Hypercomputer/gpu-recipes

Folders and files

Latest commit

History

Repository files navigation

Reproducible benchmark recipes for GPUs

Overview

Benchmarks support matrix

Training benchmarks A3 Mega

Training benchmarks A3 Ultra

Training benchmarks A4

Training benchmarks A4X

Inference benchmarks A3 Mega

Inference benchmarks A3 Ultra

Inference benchmarks A4

Inference benchmarks G4

Checkpointing benchmarks

Goodput benchmarks

Repository structure

Getting help

Disclaimer

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 18

Uh oh!

Languages