DRA: support dynamic device provisioning #5075

sunya-ch · 2025-01-23T05:57:58Z

Enhancement Description

One-line enhancement description (can be used as a release note): Enable DRA to support dynamically provisioned devices
Kubernetes Enhancement Proposal:KEP-5075: DRA: Dynamic Device Provisioning Support #5104
Discussion Link: Device management community call on 21 Jan 2025 started at 35:40 (https://youtu.be/JdQZduy2pnc?si=iH13vbhQuYzwj1Wj&t=2140)
Primary contact (assignee): @sunya-ch
Responsible SIGs: SIG Node (SIG Network?)
Enhancement target (which target equals to which milestone):
- Alpha release target (x.y): 1.34
- Beta release target (x.y):
- Stable release target (x.y):
Alpha
- KEP (k/enhancements) update PR(s):
- Code (k/k) update PR(s):
- Docs (k/website) update PR(s):

Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.

Note:

Motivation:

The device driver can generate a new device and allocate to the pod dynamically based on selected host device or profile.
The original use case is the CNI driver which can call macvlan or ipvlan to generate a virtual device with given master network device.

This enhancement leverages benefit of DRA over conventional CNI approach especially in multi-network context which I would like to highlight on user story 1 in the KEP draft. Assume a node has 2 x 10Gbps NICs, if 1 NIC has been allocated to one pod, another pod can request for their 10Gps network without a need to check which NIC has not allocated yet and hard code the master device name.

The text was updated successfully, but these errors were encountered:

bart0sh · 2025-01-23T07:25:12Z

/cc

ffromani · 2025-01-23T07:41:41Z

/cc

aojea · 2025-01-23T08:12:21Z

Please put the PR in draft mode if you want reviews

johnbelamaric · 2025-01-24T01:38:59Z

Thank you @sunya-ch, this is really interesting. As mentioned in the WG meeting, there are several KEPs that I think have some overlap, and I'd like to try to bring them together into a single solution, if possible.

In particular, this proposal has some similarities to a concept we have been discussing for a while which I call "per-device allocatable resources" (see this doc which discusses this concept among other things, though I am not sure I use that term in the doc). In that concept, the "capacities" of a given device can be allocatable, allowing sharing of the device in the same manner that node allocatable resources allow sharing of a node. In your case, you do the same thing but you provision a new device to represent that set of shared capacity, and reference the source device. That may be useful construct, I need to think about it some more. In the "per-device allocatable resources", we don't need an explicit provisioning limit; once all of a capacity is consumed, you can't allocate any more of it.

Another aspect of this I see is that the need to "provision" creates a need for a lifecycle of resource claim actuation. This is similar to what is needed for #5007, except in the networking case it does not (I don't think?) need to happen before the pod gets scheduled. So that may not be quite the same thing.

My next step is to look at this #5007, #5075, #4815, and a few other ideas that don't yet have enhancement issues, and start a doc where we can work through a design together.

cc @catblade

swatisehgal · 2025-01-27T16:00:35Z

/cc

ffromani · 2025-01-28T18:16:21Z

/cc

sunya-ch · 2025-01-29T07:17:32Z

@johnbelamaric Thank you so much for the pointer to related enhancement issues. I will walk through the list too.
Looking forward for the collaboration!

sunya-ch · 2025-01-29T07:20:10Z

Please put the PR in draft mode if you want reviews

Thank you. I will create a PR in draft.

pohly · 2025-01-29T08:52:11Z

Please also update the issue description to use the normal KEP template (checklists, etc.). Then you can use the 5075 issue number as the number in your KEP PR.

johnbelamaric · 2025-01-29T16:10:10Z

@sunya-ch please create the draft PR and I will comment there. I was going to put together a doc but I think commenting is probably better. I think we might be able to solve this and @catblade's use cases with per-device allocatable resources. My current thinking is that this is distinct from the disaggregated device KEP, because your device creation is all still node local and therefore the lifecycle is the same as our existing device models. But once I have the PR I can provide a more thorough response.

bart0sh · 2025-01-29T17:39:57Z

/cc

sunya-ch · 2025-01-30T01:59:47Z

@pohly Thank you for your advice. I have updated the issue description.

@aojea @johnbelamaric I have created the draft PR #5104

k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jan 23, 2025

github-project-automation bot added this to SIG Node 1.33 KEPs planning Jan 23, 2025

aojea added the sig/network Categorizes an issue or PR as relevant to SIG Network. label Jan 23, 2025

aojea self-assigned this Jan 23, 2025

haircommander moved this to Triage in SIG Node 1.33 KEPs planning Jan 24, 2025

haircommander moved this from Triage to Sig Node Consulting in SIG Node 1.33 KEPs planning Jan 28, 2025

sunya-ch mentioned this issue Jan 30, 2025

KEP-5075: DRA: Dynamic Device Provisioning Support #5104

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRA: support dynamic device provisioning #5075

DRA: support dynamic device provisioning #5075

sunya-ch commented Jan 23, 2025 •

edited

Loading

bart0sh commented Jan 23, 2025

ffromani commented Jan 23, 2025

aojea commented Jan 23, 2025

johnbelamaric commented Jan 24, 2025

swatisehgal commented Jan 27, 2025

ffromani commented Jan 28, 2025

sunya-ch commented Jan 29, 2025

sunya-ch commented Jan 29, 2025

pohly commented Jan 29, 2025

johnbelamaric commented Jan 29, 2025

bart0sh commented Jan 29, 2025

sunya-ch commented Jan 30, 2025

DRA: support dynamic device provisioning #5075

DRA: support dynamic device provisioning #5075

Comments

sunya-ch commented Jan 23, 2025 • edited Loading

Enhancement Description

Motivation:

bart0sh commented Jan 23, 2025

ffromani commented Jan 23, 2025

aojea commented Jan 23, 2025

johnbelamaric commented Jan 24, 2025

swatisehgal commented Jan 27, 2025

ffromani commented Jan 28, 2025

sunya-ch commented Jan 29, 2025

sunya-ch commented Jan 29, 2025

pohly commented Jan 29, 2025

johnbelamaric commented Jan 29, 2025

bart0sh commented Jan 29, 2025

sunya-ch commented Jan 30, 2025

sunya-ch commented Jan 23, 2025 •

edited

Loading