Make multiprocessing terminate gracefully #53

bengioe · 2023-02-22T19:45:37Z

Current multiprocessing/threading routines are not explicitly stopped, they just rely on the objects they belong to to be garbage collected to stop. This sometimes causes aesthetically displeasing logs where all the threads produce errors.

bengioe · 2024-02-06T22:17:36Z

Addressed by #117 and #116. I will leave this up as a reminder to run tests but most problems on this front are presumably solved.

SobhanMP · 2024-02-17T05:25:35Z

I did my best to flush all the queues, but I still think it ends up freezing on rare occasions on Beluga (compute Canada/calculate Quebec). I do not understand why it's just Beluga not Cedar/Narval/Mila's cluster. I have this snippet of code that I'd use if running jobs on the clusters 🙃

def haragiri(signum, frame):
    os.kill(os.getpid(), signal.SIGTERM)
signal.signal(signal.SIGALRM, haragiri)
signal.alarm(10 * 60)

Another thing to consider is making the code work with other multithreading strategies. AFAIK set_start_method spawn or forkserver does not work currently but are the "recommended" way of starting new processes.

bengioe added the enhancement New feature or request label Feb 22, 2023

bengioe mentioned this issue Mar 20, 2023

Log-reward uniformisation, logging and new hyperparams for temperature, Z-training, validation. #54

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make multiprocessing terminate gracefully #53

Make multiprocessing terminate gracefully #53

bengioe commented Feb 22, 2023

bengioe commented Feb 6, 2024

SobhanMP commented Feb 17, 2024

Make multiprocessing terminate gracefully #53

Make multiprocessing terminate gracefully #53

Comments

bengioe commented Feb 22, 2023

bengioe commented Feb 6, 2024

SobhanMP commented Feb 17, 2024