-
Notifications
You must be signed in to change notification settings - Fork 257
fix: fixing Stateless CNI delete in SwiftV2 scenario #3967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes issues with Stateless CNI delete operations in SwiftV2 scenarios by modifying endpoint ID generation and improving the delete flow. The changes ensure proper distinction between different NIC types and provide necessary context for transparent client operations.
- Modifies
GetEndpointID
to accept aNICType
parameter and append interface name for delegated NICs - Updates delete flow to use proper network manager clients and adds
NetNsPath
to state information - Adds
NetworkNameSpace
field to CNS REST server structures for frontend NIC support
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
network/manager.go | Core logic changes for endpoint ID generation and delete flow improvements |
network/manager_mock.go | Mock implementation updated to match new GetEndpointID signature |
cns/restserver/restserver.go | Added NetworkNameSpace field to IPInfo struct |
cns/restserver/ipam.go | Updated validation and state management for NetworkNameSpace field |
cni/network/network.go | Updated callers to pass NICType parameter to GetEndpointID |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a lot of context on the need for these changes. The changes look fine, but we should improve the PR a bit -
- Update the description to cover why the opMode change
- Address other copilot comments (one about NICType in particular seems serious)
- Add a description of what validation steps have been carried out. Include screenshots/logs if necessary.
- Add tests.
cni/network/network.go
Outdated
} else { | ||
ifName = "eth" + strconv.Itoa(opt.endpointIndex) | ||
endpointID = plugin.nm.GetEndpointID(opt.args.ContainerID, ifName) | ||
endpointID = plugin.nm.GetEndpointID(opt.args.ContainerID, ifName, opt.ifInfo.NICType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what scenarios was this enabled in beforehand/currently?
Windows swift v1 (single nic)? y/n
Windows swift v2 (multi nic)? y/n
Linux swift v1 (single nic)? y/n
Linux swift v2 (multi nic)? y/n
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been enabled in all swiftV2 multi nic scenarios and also singularity.
In the new commit a new API defined GetEndpointIDByNicType to avoid touching the previous one and also it only make changes to the containerID in statelessCNI DElegatedNIC case.
52b1ed1
to
a6b378e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
preliminary suggestions-- could you also add how you tested and what happens without each of these changes?
}() | ||
// TODO: For stateless cni linux, check if delegated vmnic type, and if so, delete using this *endpoint* struct's ifname | ||
// For stateless cni linux, check if delegated vmnic type, and if so, delete using this *endpoint* struct's ifname | ||
if ep.NICType == cns.DelegatedVMNIC { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall we had swiftv2 without stateless previously-- why was this not required then?
Additionally, why do we use cns.DelegatedVMNIC here when we use cns.NodeNetworkInterfaceFrontendNIC
when selecting the endpoint in endpoint_linux.go
. They both have the same value. It seems like secondary endpoint clients are created only when the nic type is of value FrontendNIC anyway so why this check?
if you're going by the comment it might be outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NodeNetworkInterfaceFrontendNIC has been used instead in new commit to be consistenet with CNS contract
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you address the first two questions?:
I recall we had swiftv2 without stateless previously-- why was this not required then?
It seems like secondary endpoint clients are created only when the nic type is of value FrontendNIC anyway so why this check?
// TODO: SecondaryInterfaces map should be retired and only IfName field and NICType should be used to determine the delgated NIC | ||
for iface := range ep.SecondaryInterfaces { | ||
if err := client.netlink.SetLinkNetNs(iface, uintptr(vmns)); err != nil { | ||
if err := client.moveInterfaceToHostNetns(iface, vmns); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the function mentions it moves the interface to the host ns but it seems like we still need to provide the ns id (an int)-- wondering if either 1) this should have a more general name (as it just moves an interface to any netns id) or 2) just keep it in-line w/o a new func?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved it to a different function since this prosess is being called multiple times in this files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change the name and comments for the function to like moveInterfaceToNetns? The int passed in is not necessarily the vm's namespace id. Sure it's called multiple times but we still need to handle the error-- you're replacing a function call with another function call that seems to do the exact same thing?
UpdateEndpoint(networkID string, existingEpInfo *EndpointInfo, targetEpInfo *EndpointInfo) error | ||
GetNumberOfEndpoints(ifName string, networkID string) int | ||
GetEndpointID(containerID, ifName string) string | ||
GetEndpointIDByNicType(containerID, ifName string, nicType cns.NICType) string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: GetEndpointIDByNICType as nic is an acronym
cni/network/network.go
Outdated
} else { | ||
for i, epInfo := range epInfos { | ||
logger.Info("Found endpoint to delete", zap.String("IfName", epInfo.IfName), zap.String("EndpointID", epInfo.EndpointID), zap.Any("NICType", epInfo.NICType)) | ||
epInfos[i].NetNsPath = args.Netns // in case DelegatedNIC need to be moved to host namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is used in secondary endpoint client (I believe via epInfo.NetNsPath --> ep.NetworkNameSpace ) , could you add a comment mentioning that's how it is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this modification after we get the state be better suited for GetEndpointState (similar to how we pass in networkID and then populate it in each epInfo?)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not want to change the function for getEndpointState(). The comment has been added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you mention it's used in the secondary endpoint client-- unless it's also being used in windows? Also if it does not change the logic, can you populate the epinfos right after you call GetEndpointState
4b7c752
to
5c01db3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that cni/network/* has some issues with tests? Also please respond to the previous comments I made.
cni/network/network.go
Outdated
// When that happens, kubelet and containerd hang during sandbox creation or teardown. | ||
// The delegated NIC (SR-IOV VF) used by SwiftV2 for multitenant pods remains tied to the pod namespace, | ||
// triggering hot-unplug/re-register events and leaving the node in an unhealthy state. | ||
plugin.nm.RemoveSecondaryEndpointFromPodNetNS(args.IfName, args.Netns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain why we cannot delete using the normal delete flow and instead call this new method? What info is missing when we cannot retrieve data from CNS? If we can delete everything on CNI side without the CNS (which seems to be the case here), why have this new method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In normal Stateless CNI flow we need the delegated NIC Name which will be missing in this case. Since Stateless CNI cannot retrieve the state it cannot go through normal delete flow and we need this special case function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like you can delete the secondary nics even without the delegated NIC name though, based on this function RemoveSecondaryEndpointFromPodNetNS
-- why can't this be the normal flow?
network/manager.go
Outdated
} | ||
|
||
// RemoveSecondaryEndpointFromPodNetNS removes the secondary endpoint from the pod netns | ||
func (nm *networkManager) RemoveSecondaryEndpointFromPodNetNS(ifName string, netns string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems really odd to me-- manager.go is meant to deal with general state, endpoint creation/deletion that can apply to all endpoint/nic type scenarios, but secondary endpoints (add/remove) are an implementation detail that applies to only one of the various scenarios. This logic should be at the client level (like changes should be in secondary_endpoint_client_linux only and it should decide the how to handle it based on the parameters passed in). This applies to all the RemoveSecondaryEndpointFromPodNetNS
and removeSecondaryEndpointFromPodNetNSImpl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, but all of the call to platform related functions are called from manager.go. I did not see a direct call to Implementation from CNI package.
|
||
// removeInterfacesfromNetnsPath finds and removes all interfaces from the specified netns path except the infra and non-eth interfaces. | ||
func (client *SecondaryEndpointClient) RemoveInterfacesfromNetnsPath(infraInterfaceName string, netnspath string) error { | ||
// Get VM namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make a helper function for this namespace changing? Or if one already exists use that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already created a helper function to remove the interface form the netnsPath
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that RemoveInterfacesfromNetnsPath is a helper, but noticed that the code here https://github.com/Azure/azure-container-networking/pull/3967/files#diff-a2a6265dc83a2c15c13849f9f3f4b1f2758e06f0973093002e0077dd8f20d4efR155 also opens a network namespace-- this is what I am referring to
// ListIfNamesInNetns returns all interface names in the given netns path. | ||
func listIfNamesInNetns(ns NamespaceInterface) ([]string, error) { | ||
// Use the existing netlink interface to list links | ||
links, err := vishnetlink.LinkList() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this can be mocked? ex: in your test we have a mock vishnetlink with that one method you're using LinkList and we return something that satisfies the vishnetlink.Link interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tired to mock this but still even mocking this is not enough to create a UT for the new releaseSecodnary function.
if iface == infraInterfaceName || !strings.Contains(iface, "eth") { | ||
continue | ||
} | ||
if err := client.MoveInterfaceToHostNetns(iface, vmns); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm so why do we leave the infra eth0 in the pod namespace? Does the eth0 (infra) get cleaned up later on in the delete call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it like during async delete, we only have 1 endpoint info received (since can't contact CNS for others) whereas if CNS were up, we'd get 1 endpoint info for each (1 for infra, 1 for each delegated) and so we'd be able to rely on the existing logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. Eth0 does not suffer from this issue similar to secondary interface and once the NETNS get removed i is working fine. For delgated interface like eth1 we need to find it and moving it back to host
b342517
to
0118a49
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
0118a49
to
8bfda69
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
8bfda69
to
3f3e4a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
3f3e4a5
to
ab9e3ce
Compare
ab9e3ce
to
6fbcfd0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@behzad-mir lets discuss offline on these changes. i feel it can be done in another way
cni/network/network.go
Outdated
// The delegated NIC (SR-IOV VF) used by SwiftV2 for multitenant pods remains tied to the pod namespace, | ||
// triggering hot-unplug/re-register events and leaving the node in an unhealthy state. | ||
// This workaround mitigates the issue by removing the secondary NIC from the pod netns when CNS is unreachable during DEL to provide the endpoint state. | ||
if err = plugin.nm.RemoveSecondaryEndpointFromPodNetNS(args.IfName, args.Netns); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should still create file for cns to release ip allocated for pod
119e9bf
to
d5e59f1
Compare
d5e59f1
to
b364575
Compare
A number of changes is made to stateless CNI to fully support SWiftV2 in Linux:
For validating the scenario ADD/Delete calls have been issues on SwiftV2 cluster and and logs and satefile has been analyzed to make sure it is consistent with Stateful CNI and also nothing gets leaked.
Requirements:
Notes: