Skip to content

Recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.

License

Notifications You must be signed in to change notification settings

AI-Hypercomputer/gpu-recipes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reproducible benchmark recipes for GPUs

License

Welcome to the reproducible benchmark recipes repository for GPUs! This repository contains recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.

Overview

  1. Identify your requirements: Determine the model, GPU type, workload, framework, and orchestrator you are interested in.
  2. Select a recipe: Based on your requirements use the Benchmark support matrix to find a recipe that meets your needs.
  3. Follow the recipe: each recipe will provide you with procedures to complete the following tasks:
    • Prepare your environment
    • Run the benchmark
    • Analyze the benchmarks results. This includes not just the results but detailed logs for further analysis

Benchmarks support matrix

Training benchmarks A3 Mega

Models GPU Machine Type Framework Workload Type Orchestrator Link to the recipe
GPT3-175B A3 Mega (NVIDIA H100) NeMo Pre-training GKE Link
Llama-3-70B A3 Mega (NVIDIA H100) NeMo Pre-training GKE Link
Llama-3.1-70B A3 Mega (NVIDIA H100) NeMo Pre-training GKE Link
Mixtral-8-7B A3 Mega (NVIDIA H100) NeMo Pre-training GKE Link

Training benchmarks A3 Ultra

Models GPU Machine Type Framework Workload Type Orchestrator Link to the recipe
Llama-3.1-70B A3 Ultra (NVIDIA H200) MaxText Pre-training GKE Link
Llama-3.1-70B A3 Ultra (NVIDIA H200) NeMo Pre-training GKE Link
Mixtral-8-7B A3 Ultra (NVIDIA H200) MaxText Pre-training GKE Link
Mixtral-8-7B A3 Ultra (NVIDIA H200) NeMo Pre-training GKE Link

Repository structure

  • training/: Contains recipes to reproduce training benchmarks with GPUs.
  • src/: Contains shared dependencies required to run benchmarks, such as Docker and Helm charts.
  • docs/: Contains supporting documentation for the recipes, such as explanation of benchmark methodologies or configurations.

Getting help

If you have any questions or if you found any problems with this repository, please report through GitHub issues.

Disclaimer

This is not an officially supported Google product. The code in this repository is for demonstrative purposes only.

About

Recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.

Topics

Resources

License

Stars

Watchers

Forks