fsdp

Star

Here are 26 public repositories matching this topic...

LambdaLabsML / distributed-training-guide

Star

Best practices & guides on how to write distributed pytorch training code

gpu cluster mpi cuda slurm pytorch sharding kuberentes distributed-training nccl gpu-cluster deepspeed fsdp lambdalabs

Updated Oct 22, 2025
Python

sgl-project / SpecForge

Star

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

training eagle pytorch llm fsdp sglang eagle3

Updated Dec 5, 2025
Python

meta-pytorch / torchft

Star

Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)

training distributed-systems fault-tolerance consensus ml torch pytorch hsdp fsdp localsgd diloco

Updated Dec 5, 2025
Python

tsiendragon / qwen-image-finetune

Star

Repo for Qwen Image Finetune

lora image-to-image diffusion-models fsdp peft-fine-tuning-llm flux-kontext qwen-image-edit image-edit-model qwen-image-edit-2509

Updated Dec 3, 2025
Jupyter Notebook

GURPREETKAURJETHRA / Meta-LLAMA3-GenAI-UseCases-End-To-End-Implementation-Guides

Star

META LLAMA3 GENAI Real World UseCases End To End Implementation Guide

pytorch fine-tuning sagemaker rag huggingface streamlit prompt-tuning generativeai langchain-python chromadb fsdp qlora ollama llama3 llama3-prompts llama3-rag llama3-finetune llama3-meta-ai llama3-70b-8192

Updated Sep 24, 2024
Jupyter Notebook

ruimalheiro / training-custom-llama

Star

Llama-style transformer in PyTorch with multi-node / multi-GPU training. Includes pretraining, fine-tuning, DPO, LoRA, and knowledge distillation. Scripts for dataset mixing and training from scratch.

Updated Nov 10, 2025
Python

SohamGovande / podplex

Star

🦾💻🌐 distributed training & serverless inference at scale on RunPod

serverless decentralized inference distributed llm runpod fsdp

Updated May 26, 2024
Jupyter Notebook

debnsuma / ray-for-developers

Star

A comprehensive hands-on guide to building production-grade distributed applications with Ray - from distributed training and multimodal data processing to inference and reinforcement learning.

machine-learning deep-learning distributed-computing pytorch ray ddp model-serving distributed-training multimodal mlops fsdp

Updated Oct 28, 2025
Python

AlibabaPAI / FlashModels

Star

Fast and easy distributed model training examples.

deep-learning pytorch zero data-parallelism model-parallelism distributed-training xla tensor-parallelism llm fsdp sequence-parallelism

Updated Nov 26, 2024
Python

arawxx / FSDP-Distributed-Training-of-ConvNextV2-on-CIFAR10

Star

A script for training the ConvNextV2 on CIFAR10 dataset using the FSDP technique for a distributed training scheme.

deep-learning distributed-computing pytorch distributed distributed-training distributed-learning convnext convnextv2 fsdp fully-sharded-data-parallel

Updated Dec 11, 2023
Python

zigzagcai / DeepSeekV3

Star

Simple and efficient implementation of 671B DeepSeek V3 that trainable with FSDP+EP and minimal requirement of 256x A100/H100, targeted for HuggingFace ecosystem

reinforcement-learning fsdp expert-parallel

Updated Nov 3, 2025
Python

SulRash / minLLMTrain

Star

Minimal yet high performant code for pretraining llms. Attempts to implement some SOTA features. Implements training through: Deepspeed, Megatron-LM, and FSDP. WIP

huggingface pretraining deepspeed megatron-lm llm fsdp

Updated Feb 6, 2024
Python

liangyuwang / train-large-model-from-scratch

Star

A minimal, hackable pre-training stack for GPT-style language models

gpt llms fsdp

Updated Oct 20, 2025
Python

free001style / efficient-dl-systems

Star

Implementations of some popular approaches for efficient deep learning training and inference

profiler amp offloading multigpu quanti fsdp tensorparallel

Updated Mar 30, 2025
Python

abhilash1910 / Framework-Optimization

Sponsor

Star

Framework, Model & Kernel Optimizations for Distributed Deep Learning - Data Hack Summit

pytorch triton codegen inductor ddp deepspeed fsdp tensorparallel pipelineparallel

Updated Aug 1, 2023
Python

walln / loadax

Star

Dataloading for JAX

datasets ddp distributed-training dataloading jax xla fsdp

Updated Oct 3, 2024
Python

fereydoonboroojerdi / multimodal-customer-insights-generator

Star

Scalable multimodal AI system combining FSDP, RLHF, and Inferentia optimization for customer insights generation.

aws deep-learning pytorch customer-insights sagemaker inferentia rlhf fsdp multimodal-ai

Updated May 3, 2025
Python

wonmo4692 / qwen-image

Star

🎨 Generate high-quality images with the Qwen-Image model, a powerful text-to-image tool optimized for fast and efficient deployment on serverless architecture.

flux image-generation mps wan fine-tuning text2image deepspeed txt2img stable-diffusion ggml fsdp videogeneration video-language-model flux-dev flux-schnell flux-kontext qwen-image-edit image-edit-model

Updated Nov 15, 2025

salma2vec / shardspark

Star

Mini-FSDP for PyTorch. Minimal single-node Fully Sharded Data Parallel wrapper with param flattening, grad reduce-scatter, AMP, and tiny GPT/BERT training examples.

pytorch parallelism systems distributed-training fsdp

Updated Sep 21, 2025
Python

hyunnnchoi / google-t5-fsdp-kubeflow

Star

A foundational repository for setting up distributed training jobs using Kubeflow and PyTorch FSDP.

pytorch distributed-deep-learning kubeflow fsdp

Updated Jan 7, 2025
Python

Improve this page

Add a description, image, and links to the fsdp topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the fsdp topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fsdp

Here are 26 public repositories matching this topic...

LambdaLabsML / distributed-training-guide

sgl-project / SpecForge

meta-pytorch / torchft

tsiendragon / qwen-image-finetune

GURPREETKAURJETHRA / Meta-LLAMA3-GenAI-UseCases-End-To-End-Implementation-Guides

ruimalheiro / training-custom-llama

SohamGovande / podplex

debnsuma / ray-for-developers

AlibabaPAI / FlashModels

arawxx / FSDP-Distributed-Training-of-ConvNextV2-on-CIFAR10

zigzagcai / DeepSeekV3

SulRash / minLLMTrain

liangyuwang / train-large-model-from-scratch

free001style / efficient-dl-systems

abhilash1910 / Framework-Optimization

walln / loadax

fereydoonboroojerdi / multimodal-customer-insights-generator

wonmo4692 / qwen-image

salma2vec / shardspark

hyunnnchoi / google-t5-fsdp-kubeflow

Improve this page

Add this topic to your repo