Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Mysensors HAL retry hack #1477

Open
wants to merge 3 commits into
base: development
Choose a base branch
from

Conversation

nekromant
Copy link

This MR implements my simple attempt to get more reliable packet delivery for actuators and exposes a few more parameters that can be tuned for optimal operation. Tested with nRF24L01+ modules.

  1. Implement software-based retransmissions in transportHAL. When MY_TRANSPORT_SEND_RETRIES is defined to a certain number, transportHALSend() will call the underlying transportSend() for up to N times, before failing. The actual retransmission count would be (N * 15) for nRF24L01, since the nRF does auto-retransmission on it's own.

A side-effect of this would be packet duplication in case the 'ack' is lost, since we don't 'reuse' payloads already stored in the radio. Reusing the payload stored already in the radio will work as well (tested by me with nRF24L01 on a different project), but that requires changes into the transport API and for every radio driver out there.

Test setup: A moderate (~20 nodes) mysensors network, some within 3-4 hops from the gateway within several small buildings. (e.g. heaters in the chicken coop). Heavy snow and rain caused significant link quality degradation that mostly affected nodes with enabled signing that run into problems requesting nonce-s.

  1. Allows user to override RF24_SET_ARD and RF24_SET_ARC via external define.

When running at 250Kbps for better range, setting the RF24_SET_ARD to the maximum improves connection quality. I'm still testing different parameters of this option, but it seems to help out when there are repeaters that get choked with traffic (e.g. we give them time to send the messages upwards, and

Signed-off-by: Andrew Andrianov <andrew@ncrmnt.org>
Signed-off-by: Andrew Andrianov <andrew@ncrmnt.org>
@virtual-maker
Copy link
Contributor

virtual-maker commented Feb 28, 2021

Hi @nekromant, great topic!
I assume you should replace space char by a tab in keywords.txt.
Additional I would rename RF24_SET_ARD and RF24_SET_ARC to be more like MySensors style.
May be into MY_RF24_SET_ARD and MY_RF24_SET_ARC?

@mfalkvidd
Copy link
Member

mfalkvidd commented Feb 28, 2021

Should there be some random (and maybe exponential) backoff/delay? If the first 15x repeats message is lost due to collisions, will transmitting N*15 times more in a continuous stream make the transmission situation better or worse?

@virtual-maker
Copy link
Contributor

virtual-maker commented Feb 28, 2021

@nekromant and @mfalkvidd
Currently I would not implement the repeat handling in the transport HAL, because there are multiple choices how to do this and current solution is specific for RF24 but there are some other transport physics with different behavior. Also it introduces the "same telegram - multiple times" problem.

Instead I recommend to add two hooks in transport HAL for send and receive. Then the user has the opportunity to handle the telegram repetition on send and receive side in his user code sketch. And it will not break any existing code, if the hooks are simply not used.

How to handle the repetition in user code would be a nice topic for the forum thread.

Also a discussion for optimal values for MY_RF24_SET_ARD and MY_RF24_SET_ARC may be depending from MySensors network setup would be helpful.

@nekromant
Copy link
Author

nekromant commented Mar 1, 2021

Should there be some random (and maybe exponential) backoff/delay?

@mfalkvidd I'm not sure if it's a good idea to call wait() in transportHAL, so I didn't implement the backoff. But I may be totally wrong, since I'm pretty new to this codebase.

If the first 15x repeats message is lost due to collisions, will transmitting N*15 times more in a continuous stream make the transmission situation better or worse?

I would speculate, that the 15x repeats failure occur when the uplink buffers a burst of packets and slowly chews them, sending further, not listening to whatever is sent, since the radio is only half duplex. If it's an atmega328p running at some 12Mhz working as a repeater, this may take a while.

MY_RF24_SET_ARD

Before learning about mysensors and moving on from my homebrew RPC-over-air to mysensors, I've been pretty much stuck on ARD=ARC=15. The rationale here is, that if the packet didn't go through, it's either a collision (unlikely) or the other side is not listening. Switching modes takes a long while, so there's no need to retry fast. I'm still playing with these settings and collecting stats, but I was surprised to learn that it's not ARD!=15 in mysensors.

Also it introduces the "same telegram - multiple times" problem.

There's a way to do the retransfer (at least for nRF24) that wouldn't lead to telegram duplication. See: https://github.com/nekromant/antares/blob/master/src/lib/wireless/rf24.c#L924

Summing up, I see 2 ways of implementing retransfers cleaner:

  • Passing a num_retries to the radio driver. The driver will make sure to send the payload up two N times without telegram duplication
  • Implement the callbacks for the user app. Perhaps NodeManager would be able to handle that logic, but avoiding telegram duplication without altering the drivers would be more difficult.

@virtual-maker
Copy link
Contributor

@nekromant
Thank you for your detailed answer. Lot of topics, e.g. half duplex, wait(), telegram duplication. For the wait() and telegram duplication issues I have some results from my experiments I did some weeks ago with a very similar approach like yours.

I would be happy if we could discuss this with other MySensors users at the MySensors forum thread I have started yesterday. May be some of the MySensors experts will join there and give a comment.

Otherwise I'm afraid this PR will become one more in the list of unfinished and unmerged ones. I would regret that very much, because the sometimes lost telegrams in MySensors are a serious problem from my point of view. A problem that should be solved. And I am convinced that it can be solved with your approach.

@nekromant
Copy link
Author

nekromant commented Mar 2, 2021

@virtual-maker Okay, moving on discussion over to mysensors forum thread. (Hopefully someone will approve my first post there soon)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants