Skip to content

Conversation

kohankhaki
Copy link
Collaborator

  • This pull request introduces a new distributed finetuning template for LLMs, enabling scalable training using either DDP or FSDP with Hydra and Submitit orchestration. It adds a complete configuration, launch, and training pipeline, along with documentation and a compute config for multi-GPU training.
  • Added a Slurm compute configuration (bon_echo/a40_4x.yaml) for running jobs on 4xA40 GPU nodes, including resource and partition settings.

@kohankhaki kohankhaki requested a review from jwilles September 18, 2025 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant