Fixes TypeError and infinite looping in MPITaskScheduler #3783
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR attempts to fix the following bugs in the MPITaskScheduler:
schedule_backlog_tasks
method takes tasks from the backlog and attempts to schedule them until the queue is empty. However since calling put_task pops the task back onto the backlog queue, this ends up in an infinite loop if there's at least 1 task that cannot be scheduled.TypeError
unhashable type: dict.Changed Behaviour
Fixes
schedule_backlog_tasks
is now updated to fetch all tasks in the backlog_queue and then attempt to schedule them avoiding the infinite loop.PrioritizedTask
dataclass is added that disable comparison on thetask: dict
element.num_nodes * -1
to ensure that larger jobs get prioritized.Type of change
Choose which options apply, and delete the ones which do not apply.