Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UDP source IP into golang dialer https connection for Edgeview #4551

Merged
merged 1 commit into from
Jan 29, 2025

Conversation

naiming-zededa
Copy link
Contributor

@naiming-zededa naiming-zededa commented Jan 26, 2025

  • have seen on certain EVE devices, even walk through various ports, the UDP source IP address used stuck on one particular IP address, and it then failed on DNS query and failed Edgeview connection.
  • change the retry only after walk through all the intfs to speed up the initial connection

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we have this functionality (plus the functionality to do DNS lookups per interface, and caching) in pkg/pillar/zedcloud/send.go, would it make sense to use that package?

@naiming-zededa
Copy link
Contributor Author

Given that we have this functionality (plus the functionality to do DNS lookups per interface, and caching) in pkg/pillar/zedcloud/send.go, would it make sense to use that package?

I think this code is only for websocket connection for Edgeview, not to send the packets, once connected then it's done. It is not easy to change to the send.go's functions. Also this edgeview does not subscribe pillar services, like devicenetworkstatus in the current implementation. The original intention is to run this even on generic linux machines.

@eriknordmark
Copy link
Contributor

I think this code is only for websocket connection for Edgeview, not to send the packets, once connected then it's done. It is not easy to change to the send.go's functions. Also this edgeview does not subscribe pillar services, like devicenetworkstatus in the current implementation. The original intention is to run this even on generic linux machines.

Then I don't understand. Setting the source address is useful for selecting a network port in EVE-OS (because we configure PBR rules for this purpose), but it doesn't do that on a generic linux machine - it merely sets the source address but the traffic might be routed out the wrong port.

@naiming-zededa
Copy link
Contributor Author

I think this code is only for websocket connection for Edgeview, not to send the packets, once connected then it's done. It is not easy to change to the send.go's functions. Also this edgeview does not subscribe pillar services, like devicenetworkstatus in the current implementation. The original intention is to run this even on generic linux machines.

Then I don't understand. Setting the source address is useful for selecting a network port in EVE-OS (because we configure PBR rules for this purpose), but it doesn't do that on a generic linux machine - it merely sets the source address but the traffic might be routed out the wrong port.

The logic here in Edgeview is get the default route pointed interfaces, and get their port ip addresses of those interfaces. then walk through them one by one, plus an extra one does not use any source ip address. Say the device has default route points out through 'eth1' and 'eth2', so this edgeview connection walk, will be using the source ip of eth1, then eth2, then no source ip. It will keep try those three options. A simple way compare to current EVE send packets to controller.

@eriknordmark
Copy link
Contributor

The logic here in Edgeview is get the default route pointed interfaces, and get their port ip addresses of those interfaces. then walk through them one by one, plus an extra one does not use any source ip address. Say the device has default route points out through 'eth1' and 'eth2', so this edgeview connection walk, will be using the source ip of eth1, then eth2, then no source ip. It will keep try those three options. A simple way compare to current EVE send packets to controller.

But that is useless on Linux out of the box.
The fact that the source IP is set to the IP address on eth0 has no effect on routing. Thus your three attempts might all be routed out eth1 - there isn't any round-robin in the kernel when selecting among multiple default routes.

This works on EVE-OS because we set up ip rules to use a different routing table based on the source IP address. That is done so that the routing table used only has the routes which point out that particular port. So your code is useful on EVE-OS but not on Linux itself.

Also, on EVE-OS, in addition to the interface selection the code in send.go also makes the DNS lookups use the matching interface and we presumably need that logic for EdgeView as well for it to be robust in e.g., LTE/5G setups with walled garden DNS servers. That was why I was wondering if you could use that package at least when building for EVE-OS.

@naiming-zededa
Copy link
Contributor Author

Yes, this source IP scheme will only work in EVE-OS currently. But normal linux probably does not have this complexity, and the intention was not to include lots of specific import from EVE. Although looking at the send.go implementation, they are mostly internal to usage of when sending packets to, it is not easy to convert this facility into the Edgeview code in the current format. I can make a comment on this.

@eriknordmark
Copy link
Contributor

Yes, this source IP scheme will only work in EVE-OS currently. But normal linux probably does not have this complexity, and the intention was not to include lots of specific import from EVE. Although looking at the send.go implementation, they are mostly internal to usage of when sending packets to, it is not easy to convert this facility into the Edgeview code in the current format. I can make a comment on this.

OK.

- have seen on certain EVE devices, even walk through various ports, the
  UDP source IP address used stuck on one particular IP address, and it
  then failed on DNS query and failed Edgeview connection.
- change retry connection backoff after walking through all the intfs to
  speed up the initial connection time

Signed-off-by: Naiming Shen <naiming@zededa.com>
Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eriknordmark eriknordmark merged commit daf734e into lf-edge:master Jan 29, 2025
41 checks passed
@naiming-zededa naiming-zededa deleted the naiming-edgeview-udp branch January 31, 2025 20:09
@naiming-zededa
Copy link
Contributor Author

@eriknordmark should this be ported into 12 and/or 13 stable? since we have seen this issue on customer's devices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants