Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make multiprocessing terminate gracefully #53

Open
bengioe opened this issue Feb 22, 2023 · 2 comments
Open

Make multiprocessing terminate gracefully #53

bengioe opened this issue Feb 22, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@bengioe
Copy link
Collaborator

bengioe commented Feb 22, 2023

Current multiprocessing/threading routines are not explicitly stopped, they just rely on the objects they belong to to be garbage collected to stop. This sometimes causes aesthetically displeasing logs where all the threads produce errors.

@bengioe
Copy link
Collaborator Author

bengioe commented Feb 6, 2024

Addressed by #117 and #116. I will leave this up as a reminder to run tests but most problems on this front are presumably solved.

@SobhanMP
Copy link
Contributor

I did my best to flush all the queues, but I still think it ends up freezing on rare occasions on Beluga (compute Canada/calculate Quebec). I do not understand why it's just Beluga not Cedar/Narval/Mila's cluster. I have this snippet of code that I'd use if running jobs on the clusters 🙃

def haragiri(signum, frame):
    os.kill(os.getpid(), signal.SIGTERM)
signal.signal(signal.SIGALRM, haragiri)
signal.alarm(10 * 60)

Another thing to consider is making the code work with other multithreading strategies. AFAIK set_start_method spawn or forkserver does not work currently but are the "recommended" way of starting new processes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants