Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of non-blocking try-connect #106

Merged
merged 7 commits into from
Oct 29, 2024
Merged

Improve handling of non-blocking try-connect #106

merged 7 commits into from
Oct 29, 2024

Conversation

erlingrj
Copy link
Collaborator

This PR was triggered by not getting the Zephyr federated example to work on our boards anymore. It does several things:

  • Improve the error handling in the non-blocking try-connect. Basically checks for EINPROGRESS and then does select to see that socket is writable before checking error message. Possibly closing socket and reopening if it failed.
  • Reject tagged messages and actions before start tag is resolved
  • Move serialization code into serialization.c/h

@erlingrj erlingrj requested review from tanneberger and LasseRosenow and removed request for tanneberger October 29, 2024 01:42
Copy link
Contributor

Memory usage after merging this PR will be:

Memory Report

action_microstep_test_c

from to increase (%)
text 54195 54417 0.41
data 752 752 0.00
bss 480 480 0.00
total 55427 55649 0.40

action_test_c

from to increase (%)
text 53998 54188 0.35
data 752 752 0.00
bss 480 480 0.00
total 55230 55420 0.34

delayed_conn_test_c

from to increase (%)
text 54719 54776 0.10
data 744 744 0.00
bss 480 480 0.00
total 55943 56000 0.10

event_payload_pool_test_c

from to increase (%)
text 18297 18297 0.00
data 624 624 0.00
bss 320 320 0.00
total 19241 19241 0.00

event_queue_test_c

from to increase (%)
text 27239 27239 0.00
data 728 728 0.00
bss 480 480 0.00
total 28447 28447 0.00

nanopb_test_c

from to increase (%)
text 42661 42884 0.52
data 904 904 0.00
bss 320 320 0.00
total 43885 44108 0.51

physical_action_test_c

from to increase (%)
text 55159 55381 0.40
data 769 769 0.00
bss 10240 10240 0.00
total 66168 66390 0.34

port_test_c

from to increase (%)
text 54594 54651 0.10
data 744 744 0.00
bss 480 480 0.00
total 55818 55875 0.10

reaction_queue_test_c

from to increase (%)
text 26951 26951 0.00
data 728 728 0.00
bss 480 480 0.00
total 28159 28159 0.00

request_shutdown_test_c

from to increase (%)
text 54757 54979 0.41
data 744 744 0.00
bss 480 480 0.00
total 55981 56203 0.40

shutdown_test_c

from to increase (%)
text 51869 51926 0.11
data 752 752 0.00
bss 10912 10912 0.00
total 63533 63590 0.09

startup_test_c

from to increase (%)
text 51200 51257 0.11
data 752 752 0.00
bss 10688 10688 0.00
total 62640 62697 0.09

tcp_channel_test_c

from to increase (%)
text 55382 58041 4.80
data 1160 1176 1.38
bss 11072 11072 0.00
total 67614 70289 3.96

timer_test_c

from to increase (%)
text 51101 51158 0.11
data 744 744 0.00
bss 10720 10720 0.00
total 62565 62622 0.09

if (new_socket >= 0) {
self->client = new_socket;
FD_SET(new_socket, &self->set);
static lf_ret_t check_if_socket_is_writable(int fd) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and other function should maybe start with TcpIpChannel_. At least this is our current code style as I can remember?
I personally prefer to have static functions start with an underscore, it makes it more clear, which functions are actually public for me and in the IDE, but better have it consistent through the project

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. The TcpIpChannel_ prefix was initially meant for naming functions that had many implementations for different "subclasses". They are meant for implementing the corresponding functions in the public API (which is the function pointers on the struct). For purely static functions it might be different. But it might get messy if we dont pick a single convetion. I will follow your suggestion for now

@@ -280,19 +403,6 @@ static void TcpIpChannel_free(NetworkChannel *untyped_self) {
}

void TcpIpChannel_ctor(TcpIpChannel *self, const char *host, unsigned short port, int protocol_family, bool server) {
FD_ZERO(&self->set);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this to the reset function is very nice

LF_ERR(NET, "Connect failed errno=%d", errno);
self->client_connect_in_progress = false;
TcpIpChannel_reset_socket(self);
return LF_TRY_AGAIN;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first it was a little bit confusing for me that in progress and failed both return TRY_AGAIN, but I guess it does somewhat make sense from the NetworkChannel perspective.
In both cases we need to try again.

But I see one argument why maybe a differentiation could make sense here.

In the Zephyr example you sleep for some time if you get a TRY_AGAIN, but this only makes sense for the EINPROGRESS case from what I can see.

In case of a reset_socket we could just immediately try again.

So maybe return LF_IN_PROGRESS For EINPROGRESS and only wait for that case?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then also maybe we could be more transparent and instead of LF_TRY_AGAIN return a LF_CONNECTION_FAILED? The user of the API can then decide if they want to try again or not I guess?
But I am also okay with keeping TRY_AGAIN. It has the advantage of telling the user what to do ;)

Copy link
Collaborator

@LasseRosenow LasseRosenow Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way would be LF_CONNECTION_IN_PROGRESS_TRY_AGAIN and LF_CONNECTION_FAILED_TRY_AGAIN

But maybe these are getting a little bit long now :D

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went for LF_IN_PROGRESS and LF_TRY_AGAIN, on Zephyr we do a short sleep before trying again.

@LasseRosenow
Copy link
Collaborator

Looking good to me, I left some small comments :)
And nice to have more LF_DEBUG etc. That is gonna help a lot!

Copy link
Contributor

Memory usage after merging this PR will be:

Memory Report

action_microstep_test_c

from to increase (%)
text 54195 54417 0.41
data 752 752 0.00
bss 480 480 0.00
total 55427 55649 0.40

action_test_c

from to increase (%)
text 53998 54188 0.35
data 752 752 0.00
bss 480 480 0.00
total 55230 55420 0.34

delayed_conn_test_c

from to increase (%)
text 54719 54776 0.10
data 744 744 0.00
bss 480 480 0.00
total 55943 56000 0.10

event_payload_pool_test_c

from to increase (%)
text 18297 18297 0.00
data 624 624 0.00
bss 320 320 0.00
total 19241 19241 0.00

event_queue_test_c

from to increase (%)
text 27239 27239 0.00
data 728 728 0.00
bss 480 480 0.00
total 28447 28447 0.00

nanopb_test_c

from to increase (%)
text 42661 42884 0.52
data 904 904 0.00
bss 320 320 0.00
total 43885 44108 0.51

physical_action_test_c

from to increase (%)
text 55159 55381 0.40
data 769 769 0.00
bss 10240 10240 0.00
total 66168 66390 0.34

port_test_c

from to increase (%)
text 54594 54651 0.10
data 744 744 0.00
bss 480 480 0.00
total 55818 55875 0.10

reaction_queue_test_c

from to increase (%)
text 26951 26951 0.00
data 728 728 0.00
bss 480 480 0.00
total 28159 28159 0.00

request_shutdown_test_c

from to increase (%)
text 54757 54979 0.41
data 744 744 0.00
bss 480 480 0.00
total 55981 56203 0.40

shutdown_test_c

from to increase (%)
text 51869 51926 0.11
data 752 752 0.00
bss 10912 10912 0.00
total 63533 63590 0.09

startup_test_c

from to increase (%)
text 51200 51257 0.11
data 752 752 0.00
bss 10688 10688 0.00
total 62640 62697 0.09

tcp_channel_test_c

from to increase (%)
text 55382 58041 4.80
data 1160 1176 1.38
bss 11072 11072 0.00
total 67614 70289 3.96

timer_test_c

from to increase (%)
text 51101 51158 0.11
data 744 744 0.00
bss 10720 10720 0.00
total 62565 62622 0.09

Copy link
Contributor

Coverage after merging try-connect into main will be

72.02%

Coverage Report
FileStmtsBranchesFuncsLinesUncovered Lines
src
   action.c92.21%81.25%100%94.74%23, 33, 39–42
   builtin_triggers.c90.24%70%100%96.43%14, 18, 37, 40
   connection.c80.69%53.85%100%89.69%10, 101, 107, 11, 120–121, 133–134, 14, 14, 140, 142, 17–18, 18, 18–19, 21, 23–24, 29, 44, 47, 52, 57–59, 94
   environment.c88.46%83.33%83.33%90.74%28, 35–36, 76–78, 9
   event.c94.44%90%100%95.65%10, 4
   federated.c0%0%0%0%100–102, 102, 102–103, 103, 103–105, 107, 11, 110–111, 113–117, 119, 12, 120–124, 126, 126, 126–129, 131, 131, 131–133, 133, 133–134, 138–139, 139, 139, 14, 142–143, 147–149, 15, 151, 151, 151, 153–157, 16, 160, 160, 160–163, 166–167, 167, 167–168, 17, 170–171, 174–175, 180–181, 181, 181–182, 184, 186, 186, 186–189, 189, 189, 189, 189, 19, 19, 19, 190–199, 20, 20, 20, 200–201, 205, 208, 208, 208–210, 214, 217–218, 218, 218, 218–219, 22, 22, 22, 220–226, 228, 23, 230, 234–239, 24, 24, 24, 240–241, 245–246, 248–249, 25, 251–254, 256, 256, 256–258, 26, 260, 30–31, 35–42, 42, 42–43, 43, 43–44, 44, 44, 47–48, 50–53, 55, 55, 55–58, 60, 62, 64, 64, 64–65, 67–68, 68, 68–69, 71–72, 74, 78–79, 81–83, 86, 88–93, 95–97
   logging.c73.21%60%100%75%24, 24–27, 37–39, 46, 46–49, 59–60
   port.c90.91%58.33%100%100%10, 15, 19, 24–25
   queues.c90.74%80.77%100%94.90%102, 107, 113, 21–23, 46–47, 59–60, 83–87
   reaction.c90.41%75%100%97.78%18, 20, 24, 43–44, 54, 56
   reactor.c67.14%40%100%76.09%16, 19–20, 20, 20–21, 21, 21–22, 24, 38–39, 42–43, 43, 43–44, 44, 44–45, 47, 60–61
   scheduler.c81.15%64.77%94.12%87.02%102, 104, 104, 111, 125, 17, 174, 177, 177, 177–180, 182–183, 183, 183–184, 204, 224–225, 231–233, 27, 277–278, 282, 286–287, 305, 31, 36, 52–54, 54, 54–56, 56, 56, 58, 58, 58–59, 61–62, 62, 62–63, 69–72, 80–81, 85
   serialization.c50%50%50%50%16–17, 26–27, 33–35, 38–40
   tag.c40.19%31.48%60%47.92%14, 14–15, 17, 17–18, 23–24, 24, 24, 24, 24–25, 27, 27, 27, 27, 27–28, 30, 30, 30–31, 33–34, 34, 34–35, 37, 37, 37, 37, 37–38, 40, 40, 40, 40, 40–41, 43, 53–54, 63, 63–64, 83–85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85–87, 89
   timer.c94.59%66.67%100%100%14, 25
   trigger.c100%100%100%100%
src/platform/posix
   posix.c79.41%55.56%85.71%84.29%100, 16, 18, 20–21, 34–36, 38–40, 48–49, 62, 67, 79, 82, 88, 94
   tcp_ip_channel.c61.82%50%85.71%65.94%103, 105, 105, 105–107, 109–110, 110, 110–111, 111, 111–112, 114–115, 117, 117, 117, 119–120, 122–123, 127, 129–130, 130, 130–131, 133, 133, 133–134, 136, 144, 151, 156–157, 162–165, 170–171, 171, 171–174, 174, 174–177, 179–182, 184, 184, 184–186, 188–191, 194, 222–224, 231, 236–238, 246, 246–248, 25–26, 26, 26–27, 274–275, 275, 275–276, 28, 281–282, 299–300, 304–305, 316, 319, 32, 324, 33–34, 343–344, 347–348, 354–355, 367, 369, 37, 373–374, 38, 388–389, 39, 393–394, 398–399, 60–61, 66–67, 71–72, 92, 92, 92, 96–97

@erlingrj erlingrj merged commit 6a64061 into main Oct 29, 2024
6 checks passed
@erlingrj erlingrj deleted the try-connect branch October 29, 2024 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants