Skip to content

Prepare executor for SN21 integration #382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 25, 2025

Conversation

slawomir-gorawski-reef
Copy link
Contributor

@slawomir-gorawski-reef slawomir-gorawski-reef commented Jan 24, 2025

Changes summary:

  • Increase executor timeouts (that's required as 6 min is too low for SN21 jobs, it needs at least 12–15 min on A6000). I'm open to adjusting the values or doing it some better way.
  • Add Adal's Huggingface download optimization env var to dev executor class
  • Add more logging to executor (that includes job's stdout and stderr, if that's too much I can remove it)
  • Add more logging to executor manager, including when executors are killed, it wasn't clear to me what's happening and why

@slawomir-gorawski-reef slawomir-gorawski-reef marked this pull request as ready for review January 24, 2025 16:58
Comment on lines 71 to 73
# we split 144min 2 tempos window to 24 validators - this is total time after reservation,
# validator may wait spin_up time of executor class to synchronize running synthetic batch
MAX_EXECUTOR_TIMEOUT = timedelta(minutes=6).total_seconds()
MAX_EXECUTOR_TIMEOUT = timedelta(minutes=20).total_seconds()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the above comment became very confusing after this change. IMO removing it would be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree, I was gonna do it but I forgot. Thanks!

@slawomir-gorawski-reef slawomir-gorawski-reef merged commit 630acb6 into master Jan 25, 2025
15 checks passed
@slawomir-gorawski-reef slawomir-gorawski-reef deleted the executor-timeouts branch January 25, 2025 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants