Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

infiniBand SRI-OV CNI failed to configure VF "VF ib2 GUID is not valid" #307

Closed
seb-835 opened this issue May 20, 2022 · 10 comments
Closed
Assignees

Comments

@seb-835
Copy link

seb-835 commented May 20, 2022

Hi Team,

i think i am really near to get it work,
but got this in describing my testing pod:

 Normal   AddedInterface          2s    multus             Add eth0 [10.233.117.195/32] from cni0
  Warning  FailedCreatePodSandBox  1s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "88aee8b5e04d60a0fdfe9437888521e2f49a170b67ce574adf571f05f644bf74": [default/test-sriov-ib-pod:example-sriov-ib-network]: error adding container to network "example-sriov-ib-network": infiniBand SRI-OV CNI failed to configure VF "VF ib2 GUID is not valid"

i use the following manifest to test :

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: infiniband-sriov
  namespace: cattle-sriov-system 
spec:
  deviceType: netdevice
  mtu: 1500
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  nicSelector:
    vendor: "15b3"
    deviceID: "101c"
  linkType: ib
  isRdma: true
  numVfs: 4 
  priority: 90
  resourceName: mlnxnics
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovIBNetwork
metadata:
  name: example-sriov-ib-network
  namespace: cattle-sriov-system
spec:
  ipam: |
    {
     "type": "whereabouts",
     "range": "192.168.5.225/28"
    }
  resourceName: mlnxnics
  linkState: enable
  networkNamespace: default
kind: Pod
metadata:
  name: test-sriov-ib-pod
  annotations:
    k8s.v1.cni.cncf.io/networks: example-sriov-ib-network
spec:
  containers:
    - name: test-sriov-ib-pod
      image: centos/tools
      imagePullPolicy: IfNotPresent
      command:
        - sh
        - -c
        - sleep inf
      securityContext:
        capabilities:
          add: [ "IPC_LOCK" ]
      resources:
        requests:
         rancher.io/mlnxnics: "1"
        limits:
          rancher.io/mlnxnics: "1

can you give me advice to fix it ?
Thanks a lot

@e0ne e0ne self-assigned this May 23, 2022
@SchSeba
Copy link
Collaborator

SchSeba commented Aug 11, 2022

Hi @e0ne @seb-835 any update on this issue or we can close it?

@seb-835
Copy link
Author

seb-835 commented Aug 11, 2022

No update on this case, still having the issue. Any help appreciate to soldve it.

@adrianchiris
Copy link
Collaborator

adrianchiris commented Aug 11, 2022

Greeting!

After node sriov configuration via config daemon and before scheduling an IB workload on the node what are the VFs hardware address ?

it seems they are all zeroes or ones according to CNI failure

https://github.com/k8snetworkplumbingwg/ib-sriov-cni/blob/5473e6b97fa532233221a5e2ee06aa182457ffc0/pkg/sriov/sriov.go#L259

what OS and kernel are you using ?
maybe the kernel does not support get/set of VF port and node guid

@adrianchiris
Copy link
Collaborator

in sriov-network-config-daemon logs do you see error after: : "setVfGuid()" log msg ?

can you add sriov-network-config-daemon logs when it tries to configure sriov for the node ?

@fu7100
Copy link

fu7100 commented Aug 18, 2022

I also have the same problem, I don't know how to solve it, does anyone know how to solve it, please contact me, my email is fu7100@gmail.com

@SchSeba
Copy link
Collaborator

SchSeba commented Dec 19, 2022

Hi @seb-835 @fu7100 any update on this issue we are waiting for some logs.
If you manage to make it work let me know I will close this issue
thanks!

@frye233
Copy link

frye233 commented Mar 13, 2023

I have the same question. "infiniBand SRI-OV CNI failed to configure VF "VF ib9 GUID is not valid""。 I solved this problem by manually configuring the node, port and policy of VF. However, I am puzzled that the plug-in should automatically configure the relevant information of VF, instead of requiring me to configure it manually. What is the reason for this? Can you help me solve it?
Thank you very much.

@cumulus-joeyyang
Copy link

I have the same question. "infiniBand SRI-OV CNI failed to configure VF "VF ib9 GUID is not valid""。 I solved this problem by manually configuring the node, port and policy of VF. However, I am puzzled that the plug-in should automatically configure the relevant information of VF, instead of requiring me to configure it manually. What is the reason for this? Can you help me solve it? Thank you very much.

Hi @frye233

Could you tell me how did you manually configure the node/port GUID of VF? I have the same issue with raw ib-sriov cni and dp deployment. In addition, the VF I created all remain DOWN and I don't know how to bring them up though the ib PF is UP. thanks.

@SchSeba
Copy link
Collaborator

SchSeba commented Dec 21, 2023

any update on this issue or we can close it?

@SchSeba
Copy link
Collaborator

SchSeba commented Aug 20, 2024

no update closing this one

@SchSeba SchSeba closed this as completed Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants