Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRA: support dynamic device provisioning #5075

Open
4 tasks
sunya-ch opened this issue Jan 23, 2025 · 12 comments
Open
4 tasks

DRA: support dynamic device provisioning #5075

sunya-ch opened this issue Jan 23, 2025 · 12 comments
Assignees
Labels
sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@sunya-ch
Copy link

sunya-ch commented Jan 23, 2025

Enhancement Description

  • One-line enhancement description (can be used as a release note): Enable DRA to support dynamically provisioned devices
  • Kubernetes Enhancement Proposal:KEP-5075: DRA: Dynamic Device Provisioning Support #5104
  • Discussion Link: Device management community call on 21 Jan 2025 started at 35:40 (https://youtu.be/JdQZduy2pnc?si=iH13vbhQuYzwj1Wj&t=2140)
  • Primary contact (assignee): @sunya-ch
  • Responsible SIGs: SIG Node (SIG Network?)
  • Enhancement target (which target equals to which milestone):
    • Alpha release target (x.y): 1.34
    • Beta release target (x.y):
    • Stable release target (x.y):
  • Alpha
    • KEP (k/enhancements) update PR(s):
    • Code (k/k) update PR(s):
    • Docs (k/website) update PR(s):

Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.

Note:

Motivation:

The device driver can generate a new device and allocate to the pod dynamically based on selected host device or profile.
The original use case is the CNI driver which can call macvlan or ipvlan to generate a virtual device with given master network device.

This enhancement leverages benefit of DRA over conventional CNI approach especially in multi-network context which I would like to highlight on user story 1 in the KEP draft. Assume a node has 2 x 10Gbps NICs, if 1 NIC has been allocated to one pod, another pod can request for their 10Gps network without a need to check which NIC has not allocated yet and hard code the master device name.

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jan 23, 2025
@bart0sh
Copy link
Contributor

bart0sh commented Jan 23, 2025

/cc

1 similar comment
@ffromani
Copy link
Contributor

/cc

@aojea aojea added the sig/network Categorizes an issue or PR as relevant to SIG Network. label Jan 23, 2025
@aojea aojea self-assigned this Jan 23, 2025
@aojea
Copy link
Member

aojea commented Jan 23, 2025

Please put the PR in draft mode if you want reviews

@johnbelamaric
Copy link
Member

Thank you @sunya-ch, this is really interesting. As mentioned in the WG meeting, there are several KEPs that I think have some overlap, and I'd like to try to bring them together into a single solution, if possible.

In particular, this proposal has some similarities to a concept we have been discussing for a while which I call "per-device allocatable resources" (see this doc which discusses this concept among other things, though I am not sure I use that term in the doc). In that concept, the "capacities" of a given device can be allocatable, allowing sharing of the device in the same manner that node allocatable resources allow sharing of a node. In your case, you do the same thing but you provision a new device to represent that set of shared capacity, and reference the source device. That may be useful construct, I need to think about it some more. In the "per-device allocatable resources", we don't need an explicit provisioning limit; once all of a capacity is consumed, you can't allocate any more of it.

Another aspect of this I see is that the need to "provision" creates a need for a lifecycle of resource claim actuation. This is similar to what is needed for #5007, except in the networking case it does not (I don't think?) need to happen before the pod gets scheduled. So that may not be quite the same thing.

My next step is to look at this #5007, #5075, #4815, and a few other ideas that don't yet have enhancement issues, and start a doc where we can work through a design together.

cc @catblade

@swatisehgal
Copy link
Contributor

/cc

@haircommander haircommander moved this from Triage to Sig Node Consulting in SIG Node 1.33 KEPs planning Jan 28, 2025
@ffromani
Copy link
Contributor

/cc

@sunya-ch
Copy link
Author

@johnbelamaric Thank you so much for the pointer to related enhancement issues. I will walk through the list too.
Looking forward for the collaboration!

@sunya-ch
Copy link
Author

Please put the PR in draft mode if you want reviews

Thank you. I will create a PR in draft.

@pohly
Copy link
Contributor

pohly commented Jan 29, 2025

Please also update the issue description to use the normal KEP template (checklists, etc.). Then you can use the 5075 issue number as the number in your KEP PR.

@johnbelamaric
Copy link
Member

@sunya-ch please create the draft PR and I will comment there. I was going to put together a doc but I think commenting is probably better. I think we might be able to solve this and @catblade's use cases with per-device allocatable resources. My current thinking is that this is distinct from the disaggregated device KEP, because your device creation is all still node local and therefore the lifecycle is the same as our existing device models. But once I have the PR I can provide a more thorough response.

@bart0sh
Copy link
Contributor

bart0sh commented Jan 29, 2025

/cc

@sunya-ch
Copy link
Author

@pohly Thank you for your advice. I have updated the issue description.

@aojea @johnbelamaric I have created the draft PR #5104

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
Status: Sig Node Consulting
Development

No branches or pull requests

8 participants