-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Open
Labels
kind/featureCategorizes issue or PR as related to a new feature.Categorizes issue or PR as related to a new feature.
Description
What is the problem you're trying to solve
Currently, Volcano scheduler supports CPU NUMA topology awareness through the numaaware plugin, but it doesn't consider GPU NUMA topology when scheduling GPU workloads.
Example scenario:
Consider a cluster with two GPU nodes and a task requesting 4 GPUs:
Node A:
- 8 GPUs total (GPUs 0-7)
- GPUs 0-3 are on NUMA node 0
- GPUs 4-7 are on NUMA node 1
- Available: GPUs 0-2 on NUMA 0, GPU 7 on NUMA 1
Node B:
- 8 GPUs total (GPUs 0-7)
- GPUs 0-3 are on NUMA node 0
- GPUs 4-7 are on NUMA node 1
- Available: GPUs 0-5 (all 4 from NUMA 0 available)
Without GPU NUMA awareness, the scheduler might choose Node A and allocate GPUs 0,1,2,7, which spans both NUMA nodes. With GPU NUMA awareness, the scheduler should prefer Node B where it can allocate GPUs 0-3 from a single NUMA node, providing better performance.
Describe the solution you'd like
Extend the numaaware plugin to support GPU NUMA topology awareness:
- GPU Topology Information Collection
- GPU HintProvider Implementation
Additional context
No response
Documentation Updates
- This feature requires design or user documentation changes.
- If documentation changes are required, I will ensure the relevant documents are updated and published to the Volcano official website (https://volcano.sh) via the volcano-sh/website repository.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
kind/featureCategorizes issue or PR as related to a new feature.Categorizes issue or PR as related to a new feature.