Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: check usb/pci/gpu resource name #100

Merged
merged 3 commits into from
Jan 15, 2025
Merged

Conversation

Yu-Jack
Copy link
Contributor

@Yu-Jack Yu-Jack commented Dec 16, 2024

IMPORTANT: Please do not create a Pull Request without creating an issue first.

Problem:
When creating or updating a non-existed pci/usb resource name, it's allowed. Besides, vGPU device is not checked either.

Solution:
We should check .DeviceName in VM spec while creating or updating. BTW, vGPU device is in the other field, so we need to check it in spec.template.domain.devices.gpus.

.DeviceName means resource name in k8s.

Related Issue:
harvester/harvester#6542

Test plan:

Test PCI Device

  1. Enable pcidevices-controller Add-on
  2. Enable one of PCI devices
  3. Create a VM and attach previous PCI device
  4. Click "Edit as YAML" button to manually edit YAML file, then click "Create" button.
  5. Modify the .DeviceName in the spec.template.domain.devices.hostDevices.

If you're not sure how to do it, please refer following demo recording.

pcidevice.mov

Test USB Device

  1. Enable pcidevices-controller Add-on
  2. Enable one of USB devices
  3. Create a VM and attach previous USB device
  4. Click "Edit as YAML" button to manually edit YAML file, then click "Create" button.
  5. Modify the .DeviceName in the spec.template.domain.devices.hostDevices.

If you're not sure how to do it, please refer following demo recording.

usbdevice.mov

@Yu-Jack Yu-Jack self-assigned this Dec 16, 2024
@Yu-Jack Yu-Jack changed the title feat: check usb/pci device and deviceName feat: check usb/pci/gpu device and deviceName Dec 17, 2024
@Yu-Jack Yu-Jack changed the title feat: check usb/pci/gpu device and deviceName feat: check usb/pci/gpu resource name and device name Dec 17, 2024
@Yu-Jack Yu-Jack changed the title feat: check usb/pci/gpu resource name and device name feat: check usb/pci/gpu resource name Dec 17, 2024
@Yu-Jack Yu-Jack force-pushed the HARV/6542 branch 5 times, most recently from 8967558 to f2fed1f Compare December 17, 2024 09:01
Signed-off-by: Jack Yu <jack.yu@suse.com>
return nil
}

func (vmValidator *vmDeviceHostValidator) validatePCIDevice(resourceName string) (found bool, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these validators check if there is a pci resource that exists in the cluster. Do we need to check the resource with a node in any situation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could keep this issue simple. So, I'll only check if the resource name exists in the cluster. For your case, if we decide to check node selector, we should consider node affinity. Besides, it's valid that same resource names are used by different VMs. We also need to consider checking if the resource is enough.

Comment on lines 177 to 185
foundInPCI, err := vmValidator.validatePCIDevice(gpu.DeviceName)

if err != nil {
return err
}

if !foundInPCI {
return fmt.Errorf("gpu device %s: resource name %s not found in pcidevice cache", gpu.Name, gpu.DeviceName)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jack,
Why do we validate GPUs with PCI devices?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I know, the GPU is also bound to PCI device, so we can use PCI device to check it.

Copy link
Contributor Author

@Yu-Jack Yu-Jack Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we just mapped the vGPU to PCI device. Actually, It's like USB device. We can pass whole USB controller to VM with PCI device, or just pass one of USB ports to VM with USB passthrough.

Signed-off-by: Jack Yu <jack.yu@suse.com>
Copy link
Member

@WebberHuang1118 WebberHuang1118 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a nit, thanks.

@@ -108,3 +125,80 @@ func (vmValidator *vmDeviceHostValidator) validateDevicesFromSameNodes(vmObj *ku

return nil
}

func (vmValidator *vmDeviceHostValidator) validateHostDevices(vmObj *kubevirtv1.VirtualMachine) error {
for _, hostDevice := range vmObj.Spec.Template.Spec.Domain.Devices.HostDevices {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The object access path vmObj.Spec.Template.Spec.Domain.Devices.HostDevices is quite long, maybe we can consider extracting it as a utility function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can extract it later if we truly utilize a significant number of these paths.

}

func (vmValidator *vmDeviceHostValidator) validateGPUs(vmObj *kubevirtv1.VirtualMachine) error {
for _, gpu := range vmObj.Spec.Template.Spec.Domain.Devices.GPUs {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Signed-off-by: Jack Yu <jack.yu@suse.com>
Copy link
Collaborator

@ibrokethecloud ibrokethecloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. thanks.

@Yu-Jack Yu-Jack merged commit f40c93a into harvester:master Jan 15, 2025
5 checks passed
@Yu-Jack
Copy link
Contributor Author

Yu-Jack commented Jan 15, 2025

@mergify backport v1.3

Copy link

mergify bot commented Jan 15, 2025

backport v1.3

✅ Backports have been created

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants