Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

udev: mark the SRIO-V functions as unamanged by NetworkManager #1622

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

lkundrak
Copy link

@lkundrak lkundrak commented Aug 27, 2019

Description

From https://bugzilla.redhat.com/show_bug.cgi?id=1720157:

Accelerated Networking on Azure exposes a new SRIOV interface to the VM.
This interface is transparently bonded to the synthetic interface,
so NetworkManager should just ignore any SRIOV interfaces.

The rule is originally from Haiyang Zhang haiyangz@microsoft.com.


PR information

  • The title of the PR is clear and informative.
  • There are a small number of commits, each of which has an informative message. This means that previously merged commits do not appear in the history of the PR. For information on cleaning up the commits in your pull request, see this page.
  • Except for special cases involving multiple contributors, the PR is started from a fork of the main repository, not a branch.
  • If applicable, the PR references the bug/issue that it fixes in the description.
  • New Unit tests were added for the changes made and Travis.CI is passing.

Quality of Code and Contribution Guidelines


This change is Reviewable

@msftclas
Copy link

msftclas commented Aug 27, 2019

CLA assistant check
All CLA requirements met.

@codecov
Copy link

codecov bot commented Aug 27, 2019

Codecov Report

❗ No coverage uploaded for pull request base (develop@89e2b8c). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##             develop    #1622   +/-   ##
==========================================
  Coverage           ?   66.17%           
==========================================
  Files              ?       78           
  Lines              ?    11175           
  Branches           ?     1578           
==========================================
  Hits               ?     7395           
  Misses             ?     3445           
  Partials           ?      335

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 89e2b8c...01c7e3f. Read the comment docs.

@codecov
Copy link

codecov bot commented Aug 27, 2019

Codecov Report

Merging #1622 into develop will increase coverage by 0.99%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1622      +/-   ##
===========================================
+ Coverage    68.16%   69.15%   +0.99%     
===========================================
  Files           80       82       +2     
  Lines        11609    11711     +102     
  Branches      1631     1642      +11     
===========================================
+ Hits          7913     8099     +186     
+ Misses        3359     3261      -98     
- Partials       337      351      +14
Impacted Files Coverage Δ
azurelinuxagent/common/cgroupstelemetry.py 92.03% <0%> (-4.54%) ⬇️
azurelinuxagent/common/logger.py 90.79% <0%> (-3.07%) ⬇️
azurelinuxagent/ga/remoteaccess.py 88.88% <0%> (-1.21%) ⬇️
azurelinuxagent/common/protocol/restapi.py 87.95% <0%> (-0.94%) ⬇️
azurelinuxagent/common/protocol/util.py 83.41% <0%> (-0.9%) ⬇️
azurelinuxagent/agent.py 44.65% <0%> (-0.63%) ⬇️
azurelinuxagent/common/cgroup.py 90.15% <0%> (-0.53%) ⬇️
azurelinuxagent/common/rdma.py 16.84% <0%> (-0.31%) ⬇️
azurelinuxagent/common/protocol/metadata.py 53.33% <0%> (-0.05%) ⬇️
azurelinuxagent/common/event.py 84.66% <0%> (ø) ⬆️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0ba8202...0d8a01d. Read the comment docs.

@larohra
Copy link
Contributor

larohra commented Sep 4, 2019

DCR Tests (End to end tests) passed for this change

@narrieta
Copy link
Member

@trstringer could you take a look? thanks!

@narrieta narrieta requested a review from trstringer October 29, 2019 22:41
@narrieta narrieta removed their assignment Oct 29, 2019
@narrieta narrieta removed their request for review October 29, 2019 22:41
@lkundrak
Copy link
Author

lkundrak commented Nov 7, 2019

ping ping ping

@lkundrak
Copy link
Author

lala lala

@trstringer
Copy link
Contributor

trstringer commented Nov 20, 2019

@lkundrak deploying a VM with the redhat:rhel:8:latest image I see this rule in the image already. You think this should apply to all distros? What about those that don't use NetworkManager? This udev rule seems very NetworkManager-specific.

@lkundrak
Copy link
Author

Thanks for the response.

@lkundrak deploying a VM with the redhat:rhel:8:latest image I see this rule in the image already.

Yes, but that's essentially a hack.

You think this should apply to all distros?

Yes. Essentially all distros these days provide an option to use NetworkManager, even if they happen not to install it by default.

What about those that don't use NetworkManager? This udev rule seems very NetworkManager-specific.

Most importantly, the rule is Azure-specific. Without NetworkManager, the rule is somewhat unnecessary, yet it can't do any harm. It would do harm though, if it was present outside Azure, which is why it can't be shipped with NetworkManager.

@trstringer
Copy link
Contributor

You bring up a good point. Ok let me do some testing and see what this looks like in other distros with other network services, and also we need to make sure this udev rule doesn't break any other networking configurations inadvertently. I'll test a few different scenarios and let you know what I find.

Thanks for bringing this to our attention!

@trstringer
Copy link
Contributor

So after much testing and investigation, I'm confident that this approach is a good one and I think we should move forward into getting this udev rule pushed into upstream. For documentation purposes, my findings include looking at how other distros and network services handle this.

For instance, with Debian Stretch (with backports) the reason that this isn't a problem is because it doesn't utilize NetworkManager, but uses networking.service, which behind the scenes does ifup and ifdown execs. This relies on /etc/network/interfaces, which includes similar configuration:

$ cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# The normal eth0
auto eth0
iface eth0 inet dhcp
 up sleep 5; dhclient -1 -6 -nw -cf /etc/dhcp/dhclient6.conf -lf /var/lib/dhcp/dhclient6.eth0.leases -v eth0 || true # ignore failure

# Maybe the VM has 2 NICs?
allow-hotplug eth1
iface eth1 inet dhcp

# Maybe the VM has 3 NICs?
allow-hotplug eth2
iface eth2 inet dhcp

So the reason why other non-NetworkManager implementations don't need to ignore/unmanage the SRIOV interface is because they are more explicit in which interfaces they will manage.

Finally, what I like about this improvement is that if a user decides to utilize NetworkManager in the future for a VM or image, this udev rule will already exist, providing a seamless transition.

@lkundrak can you rebase upstream develop branch on yours? Once this is done, I'll do a few things and then we can move forward.

From https://bugzilla.redhat.com/show_bug.cgi?id=1720157:

Accelerated Networking on Azure exposes a new SRIOV interface to the VM.
This interface is transparently bonded to the synthetic interface,
so NetworkManager should just ignore any SRIOV interfaces.

The rule is originally from Haiyang Zhang <haiyangz@microsoft.com>.
@lkundrak
Copy link
Author

lkundrak commented Dec 8, 2019

So the reason why other non-NetworkManager implementations don't need to ignore/unmanage the SRIOV interface is because they are more explicit in which interfaces they will manage.

Yes. Note that NetworkManager can be configured to behave in the same way (not attempt to configure interfaces unless asked to do so), by installing the NetworkManager-config-server package. That is in general not recommended, because it needs more manual tinkering.

@lkundrak can you rebase upstream develop branch on yours? Once this is done, I'll do a few things and then we can move forward.

I don't see why would I need to? The branch merges cleanly, and I'm not able to test the changes because my Azure account expired and I don't really have a time to spend on this any more (I'm now on a parental leave).

@vrdmr
Copy link
Member

vrdmr commented Dec 9, 2019

@lkundrak can you rebase upstream develop branch on yours? Once this is done, I'll do a few things and then we can move forward.

I don't see why would I need to? The branch merges cleanly, and I'm not able to test the changes because my Azure account expired and I don't really have a time to spend on this any more (I'm now on a parental leave).

Hi @lkundrak - Updated branch would help us test your changes against our E2E testing automation - so that the no old changes are tested (and throw us off if there are any issues).

If you want, I can do the rebase if you don't have time? Please let me know.

@vittyvk
Copy link
Contributor

vittyvk commented Jan 14, 2020

If you want, I can do the rebase if you don't have time? Please let me know.

Lubomir is currently on a long term leave, I can pick this up from Red Hat side. Please let me know if anything else from us is needed to make this merged. Thanks!

@lkundrak
Copy link
Author

@lkundrak can you rebase upstream develop branch on yours?

I assume you meant rebase on develop. Done. Note that I haven't tested the results, but there were no conflicts.

@vrdmr
Copy link
Member

vrdmr commented Jan 20, 2020

@trstringer Can you take a look? I'll also run couple of runs from our Test Automation.

@trstringer
Copy link
Contributor

@vrdmr Looks good to me.

# Accelerated Networking on Azure exposes a new SRIOV interface to the VM.
# This interface is transparently bonded to the synthetic interface,
# so NetworkManager should just ignore any SRIOV interfaces.
SUBSYSTEM=="net", DRIVERS=="hv_pci", ACTION=="add", ENV{NM_UNMANAGED}="1"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SUBSYSTEM=="net", DRIVERS=="hv_pci", ACTION=="add", ENV{NM_UNMANAGED}="1"
SUBSYSTEM=="net", DRIVERS=="hv_pci", ACTION=="add|change|move", ENV{NM_UNMANAGED}="1"

I was testing out a version of this over in coreos/fedora-coreos-config#2176 and wasn't observing NM_UNMANAGED=1 getting set in the initramfs until I added the change|move, which is what is done over in https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/38d3834e2c464bd4820392800a3a1fe3f286fc62/data/85-nm-unmanaged.rules#L5

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lkundrak WDYT?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would NOT recommend this change without a driver check (mlx4, mlx5). There is potential for hv_pci devices to be managed. We're working on a better story wrt MANA: https://learn.microsoft.com/en-us/azure/virtual-network/accelerated-networking-mana-linux

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cjp256 do you have an example modification to this rule to achieve that goal?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants