Feat: shared annotation by cgetzen · Pull Request #2 · taichi-dev/slurm-bridge

cgetzen · 2026-02-02T18:21:29Z

Summary

Problem

Slurm-bridge does not support colocating multiple pods on a single multi-GPU node, resulting in underutilization when workloads require fewer GPUs than the node provides.

Solution

This adds an optional workload annotation slurmjob.slinky.slurm.net/shared accepting Slurm shared policy values (mcs, none, oversubscribe, topo, user) on workloads that have a 1:1 relationship between slurm jobs and pods. This excludes PodGroup and LeaderWorkerSet resources.

The admission controller ensures correctness:

validates the annotation value
ensures the annotation is immutable once the placeholder slurm job is running
ensures that it is only applied onto accepted workloads

The scheduler then applies the "shared" setting when creating the slurm job.

Limitations

Allowing group workloads to use the shared annotation is out of scope.

Group workloads use a single placeholder job for multiple pods with a fixed node count and one-node-per-pod assignment. Allowing shared on them would require supporting Slurm packing (fewer nodes than pods), which would require changes to PostFilter, submitJob node count, and annotatePodsWithNodes.

Using group workloads with DRA poses additional challenges. Slurm-bridge currently assumes one pod per node per job: PreBind is called per-pod with (pod, nodeName), and GetResources(ctx, pod, nodeName) returns the job’s allocation on that node from Slurm’s NodeResourceLayout. One ResourceClaim is created per pod for that full allocation. With multiple pods on the same node, each pod should only receive a portion of the job's allocation.

Breaking Changes

All existing behavior is maintained by default. Only workloads that opt in to using slurmjob.slinky.slurm.net/shared are affected.

Testing Notes

Additional Context

cgetzen added 9 commits February 1, 2026 12:01

init

e21ecfb

prevent annotation updates when job is running

31cde0b

cleanup

8d78ae2

docs

8e6f063

docs

8fd72a1

ammend

d590276

polish

cb12ea8

docs

99bbe0a

cleanup

b7de369

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: shared annotation#2

Feat: shared annotation#2
cgetzen wants to merge 9 commits intomainfrom
feat/shared-annotation

cgetzen commented Feb 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cgetzen commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Breaking Changes

Testing Notes

Additional Context

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cgetzen commented Feb 2, 2026 •

edited

Loading