Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloading models in distributed training #36414

Open
nikonikolov opened this issue Feb 26, 2025 · 2 comments
Open

Downloading models in distributed training #36414

nikonikolov opened this issue Feb 26, 2025 · 2 comments

Comments

@nikonikolov
Copy link

When I run distributed training, if the model is not already downloaded locally on disk, different ranks start fighting for the download and they crash.

I am looking for a fix such that:

  1. If the model is not yet downloaded on disk, only one rank downloads it. The rest of the ranks are waiting until the file is downloaded
  2. If the model is already on disk, all ranks load it simultaneously, no waiting for each other
  3. The solution is universal. In other worlds, I still instantiate the model via AutoModel instead of with some wrapper function and I don't write a bunch of if-else statements every time I need to create a model

I wasn't able to find something that can achieve this right now. I guess a very simple solution could be adding lock files when downloading a model such that other ranks wait until the completion of the download and then use the downloaded files directly

@Rocketknight1
Copy link
Member

Hi @nikonikolov, can you give us a simple reproducer script to demonstrate this?

@abhinavachu23
Copy link

import random
lower = int(input("Enter the lower bound: "))
upper = int(input("\nEnter the upper bound: "))
if lower >= upper:
print('\nUpper bound must be greater than lower bound')
exit()
num = random.randint(lower, upper)
chances = 3
print(f"\nYou have {chances} chances to guess the number!\n")
count = 0
guessed = False
while count < chances:
count += 1
guess = int(input("Guess a number: "))
if num == guess:
print("Congratulations you did it in ", count, " try")
guessed = True
break
elif num > guess:
print("You guessed too small.")
elif num < guess:
print("You Guessed too high.")
if not guessed:
print("\nThe number is %d" % num)
print("\nBetter Luck Next time!")

Fix this

Fix thisssss

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants