Skip to content

Conversation

behzad-mir
Copy link
Contributor

@behzad-mir behzad-mir commented Aug 27, 2025

A number of changes is made to stateless CNI to fully support SWiftV2 in Linux:

  1. For Stateless CNI, the EndpointID should include ifName when the NicType is Delegated. This ensures it can be distinguished from the InfraNIC endpoint. The reason for this behavior is that Stateless CNI uses only the ContainerID as the endpoint ID when saving the state file in CNS, to maintain consistency with Cilium. However, within CNI’s in-memory representation, the EndpointID must uniquely identify each NIC. As a result:
    • We always pass the ContainerID to CNS as the endpoint state key.
    • Internally, CNI uses EndpointID (which includes ifName for delegated NICs) to differentiate endpoints.
  2. Delete flow has been revised and NetNSPath has been added to the statefile since it is needed by the TransparentClient for Frontend NIC.
  3. The Transparent mode used by statefull CNI for SWiftV1 and V2 and stateless CNI should follow the same. TransparentVlan which is the original value seems to be a mistake.
  4. Async delete file creation in Stateless CNI has been moved to CNS_invoker to be consistent with Stateful CNI.
  5. Removing secondary interface from pod namespace in SwiftV2 Linux in case of asynchronous delete. This is needed to be covered for stateless CNI since if CNS is down, CNI does not have the secondary interface name. A dedicated function has been added to list the interface in pod namespace and remove the secondary.

For validating the scenario ADD/Delete calls have been issues on SwiftV2 cluster and and logs and satefile has been analyzed to make sure it is consistent with Stateful CNI and also nothing gets leaked.

Requirements:

Notes:

@Copilot Copilot AI review requested due to automatic review settings August 27, 2025 07:19
@behzad-mir behzad-mir requested review from a team as code owners August 27, 2025 07:19
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes issues with Stateless CNI delete operations in SwiftV2 scenarios by modifying endpoint ID generation and improving the delete flow. The changes ensure proper distinction between different NIC types and provide necessary context for transparent client operations.

  • Modifies GetEndpointID to accept a NICType parameter and append interface name for delegated NICs
  • Updates delete flow to use proper network manager clients and adds NetNsPath to state information
  • Adds NetworkNameSpace field to CNS REST server structures for frontend NIC support

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
network/manager.go Core logic changes for endpoint ID generation and delete flow improvements
network/manager_mock.go Mock implementation updated to match new GetEndpointID signature
cns/restserver/restserver.go Added NetworkNameSpace field to IPInfo struct
cns/restserver/ipam.go Updated validation and state management for NetworkNameSpace field
cni/network/network.go Updated callers to pass NICType parameter to GetEndpointID

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@santhoshmprabhu santhoshmprabhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a lot of context on the need for these changes. The changes look fine, but we should improve the PR a bit -

  1. Update the description to cover why the opMode change
  2. Address other copilot comments (one about NICType in particular seems serious)
  3. Add a description of what validation steps have been carried out. Include screenshots/logs if necessary.
  4. Add tests.

} else {
ifName = "eth" + strconv.Itoa(opt.endpointIndex)
endpointID = plugin.nm.GetEndpointID(opt.args.ContainerID, ifName)
endpointID = plugin.nm.GetEndpointID(opt.args.ContainerID, ifName, opt.ifInfo.NICType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what scenarios was this enabled in beforehand/currently?
Windows swift v1 (single nic)? y/n
Windows swift v2 (multi nic)? y/n
Linux swift v1 (single nic)? y/n
Linux swift v2 (multi nic)? y/n

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been enabled in all swiftV2 multi nic scenarios and also singularity.
In the new commit a new API defined GetEndpointIDByNicType to avoid touching the previous one and also it only make changes to the containerID in statelessCNI DElegatedNIC case.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@QxBytes QxBytes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preliminary suggestions-- could you also add how you tested and what happens without each of these changes?

}()
// TODO: For stateless cni linux, check if delegated vmnic type, and if so, delete using this *endpoint* struct's ifname
// For stateless cni linux, check if delegated vmnic type, and if so, delete using this *endpoint* struct's ifname
if ep.NICType == cns.DelegatedVMNIC {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall we had swiftv2 without stateless previously-- why was this not required then?

Additionally, why do we use cns.DelegatedVMNIC here when we use cns.NodeNetworkInterfaceFrontendNIC when selecting the endpoint in endpoint_linux.go. They both have the same value. It seems like secondary endpoint clients are created only when the nic type is of value FrontendNIC anyway so why this check?

if you're going by the comment it might be outdated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NodeNetworkInterfaceFrontendNIC has been used instead in new commit to be consistenet with CNS contract

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you address the first two questions?:
I recall we had swiftv2 without stateless previously-- why was this not required then?
It seems like secondary endpoint clients are created only when the nic type is of value FrontendNIC anyway so why this check?

// TODO: SecondaryInterfaces map should be retired and only IfName field and NICType should be used to determine the delgated NIC
for iface := range ep.SecondaryInterfaces {
if err := client.netlink.SetLinkNetNs(iface, uintptr(vmns)); err != nil {
if err := client.moveInterfaceToHostNetns(iface, vmns); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the function mentions it moves the interface to the host ns but it seems like we still need to provide the ns id (an int)-- wondering if either 1) this should have a more general name (as it just moves an interface to any netns id) or 2) just keep it in-line w/o a new func?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it to a different function since this prosess is being called multiple times in this files

Copy link
Contributor

@QxBytes QxBytes Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change the name and comments for the function to like moveInterfaceToNetns? The int passed in is not necessarily the vm's namespace id. Sure it's called multiple times but we still need to handle the error-- you're replacing a function call with another function call that seems to do the exact same thing?

UpdateEndpoint(networkID string, existingEpInfo *EndpointInfo, targetEpInfo *EndpointInfo) error
GetNumberOfEndpoints(ifName string, networkID string) int
GetEndpointID(containerID, ifName string) string
GetEndpointIDByNicType(containerID, ifName string, nicType cns.NICType) string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: GetEndpointIDByNICType as nic is an acronym

} else {
for i, epInfo := range epInfos {
logger.Info("Found endpoint to delete", zap.String("IfName", epInfo.IfName), zap.String("EndpointID", epInfo.EndpointID), zap.Any("NICType", epInfo.NICType))
epInfos[i].NetNsPath = args.Netns // in case DelegatedNIC need to be moved to host namespace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is used in secondary endpoint client (I believe via epInfo.NetNsPath --> ep.NetworkNameSpace ) , could you add a comment mentioning that's how it is used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this modification after we get the state be better suited for GetEndpointState (similar to how we pass in networkID and then populate it in each epInfo?)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not want to change the function for getEndpointState(). The comment has been added.

Copy link
Contributor

@QxBytes QxBytes Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you mention it's used in the secondary endpoint client-- unless it's also being used in windows? Also if it does not change the logic, can you populate the epinfos right after you call GetEndpointState

@behzad-mir behzad-mir force-pushed the swiftv2-stateless branch 5 times, most recently from 4b7c752 to 5c01db3 Compare September 25, 2025 17:30
Copy link
Contributor

@QxBytes QxBytes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that cni/network/* has some issues with tests? Also please respond to the previous comments I made.

// When that happens, kubelet and containerd hang during sandbox creation or teardown.
// The delegated NIC (SR-IOV VF) used by SwiftV2 for multitenant pods remains tied to the pod namespace,
// triggering hot-unplug/re-register events and leaving the node in an unhealthy state.
plugin.nm.RemoveSecondaryEndpointFromPodNetNS(args.IfName, args.Netns)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain why we cannot delete using the normal delete flow and instead call this new method? What info is missing when we cannot retrieve data from CNS? If we can delete everything on CNI side without the CNS (which seems to be the case here), why have this new method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In normal Stateless CNI flow we need the delegated NIC Name which will be missing in this case. Since Stateless CNI cannot retrieve the state it cannot go through normal delete flow and we need this special case function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you can delete the secondary nics even without the delegated NIC name though, based on this function RemoveSecondaryEndpointFromPodNetNS -- why can't this be the normal flow?

}

// RemoveSecondaryEndpointFromPodNetNS removes the secondary endpoint from the pod netns
func (nm *networkManager) RemoveSecondaryEndpointFromPodNetNS(ifName string, netns string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems really odd to me-- manager.go is meant to deal with general state, endpoint creation/deletion that can apply to all endpoint/nic type scenarios, but secondary endpoints (add/remove) are an implementation detail that applies to only one of the various scenarios. This logic should be at the client level (like changes should be in secondary_endpoint_client_linux only and it should decide the how to handle it based on the parameters passed in). This applies to all the RemoveSecondaryEndpointFromPodNetNS and removeSecondaryEndpointFromPodNetNSImpl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, but all of the call to platform related functions are called from manager.go. I did not see a direct call to Implementation from CNI package.


// removeInterfacesfromNetnsPath finds and removes all interfaces from the specified netns path except the infra and non-eth interfaces.
func (client *SecondaryEndpointClient) RemoveInterfacesfromNetnsPath(infraInterfaceName string, netnspath string) error {
// Get VM namespace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make a helper function for this namespace changing? Or if one already exists use that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already created a helper function to remove the interface form the netnsPath

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that RemoveInterfacesfromNetnsPath is a helper, but noticed that the code here https://github.com/Azure/azure-container-networking/pull/3967/files#diff-a2a6265dc83a2c15c13849f9f3f4b1f2758e06f0973093002e0077dd8f20d4efR155 also opens a network namespace-- this is what I am referring to

// ListIfNamesInNetns returns all interface names in the given netns path.
func listIfNamesInNetns(ns NamespaceInterface) ([]string, error) {
// Use the existing netlink interface to list links
links, err := vishnetlink.LinkList()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this can be mocked? ex: in your test we have a mock vishnetlink with that one method you're using LinkList and we return something that satisfies the vishnetlink.Link interface

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tired to mock this but still even mocking this is not enough to create a UT for the new releaseSecodnary function.

if iface == infraInterfaceName || !strings.Contains(iface, "eth") {
continue
}
if err := client.MoveInterfaceToHostNetns(iface, vmns); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm so why do we leave the infra eth0 in the pod namespace? Does the eth0 (infra) get cleaned up later on in the delete call?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it like during async delete, we only have 1 endpoint info received (since can't contact CNS for others) whereas if CNS were up, we'd get 1 endpoint info for each (1 for infra, 1 for each delegated) and so we'd be able to rely on the existing logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. Eth0 does not suffer from this issue similar to secondary interface and once the NETNS get removed i is working fine. For delgated interface like eth1 we need to find it and moving it back to host

@behzad-mir behzad-mir force-pushed the swiftv2-stateless branch 4 times, most recently from b342517 to 0118a49 Compare September 29, 2025 05:04
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Member

@tamilmani1989 tamilmani1989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@behzad-mir lets discuss offline on these changes. i feel it can be done in another way

// The delegated NIC (SR-IOV VF) used by SwiftV2 for multitenant pods remains tied to the pod namespace,
// triggering hot-unplug/re-register events and leaving the node in an unhealthy state.
// This workaround mitigates the issue by removing the secondary NIC from the pod netns when CNS is unreachable during DEL to provide the endpoint state.
if err = plugin.nm.RemoveSecondaryEndpointFromPodNetNS(args.IfName, args.Netns); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should still create file for cns to release ip allocated for pod

@behzad-mir behzad-mir force-pushed the swiftv2-stateless branch 3 times, most recently from 119e9bf to d5e59f1 Compare October 2, 2025 05:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants