-
Notifications
You must be signed in to change notification settings - Fork 722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is re-publishing messages after clean_state=True a deliberate design decision? How to handle Session Present = False? #733
Comments
This does not look quite correct to me. QOS0 messages are not acknowledged (so we have no way to confirm if the broker has received them) meaning that once a QOS0 message has been transmitted it's discarded. However if the connection is lost before the message is transmitted (or is down when Creating a new client would prevent the messages being resent (because this library holds the state in memory; a new client will have a fresh state).
I believe the library may deliver a message twice; this could occur when:
This could be addressed by checking Hopefully this helps a bit; note that I don't usually work on this library (just trying to help tidy up issues) so there may be errors in the above. |
@MattBrittan not really sure what more info is needed. It's a shame this didn't make the cut for V2.0 because this may end up being a breaking change. Not sure. Reading between the lines Paho is not correctly handling the "session present" flag in The correct behaviour here is hinted at in the standard (3.2.2.2):
All Paho needs to do here is to disconnect and raise an exception when Suggested ApproachTo avoid backward compatibility issues, make the current retry logic on "session present 0" an optional "feature". Let's call it Then if Ideally, Paho would offer some way for the application to know what is stuck in the outbound queue at that point. After that it's on the application developer. They can discard the Paho session and begin a new one, and it's on them to chose what to send and what not to send that was previously stuck in the outbound queue. |
Apologies; it looks like I may have partly misinterpreted your issue (was working through a lot of old issues at the time and must have misread the quote). I ended up focusing mostly on QOS0 which is not really relevant here. I then talked about issues handling QOSS2 messages that were being received (which is an issue, but you are focusing on sent messages). So with a focus on QOS1+ messages being transmitted from the client, when Firstly I think it's worth noting that, if I'm understanding you correctly, this will be a pretty rare situation. The following would need to occur:
(there is an alternative, which you allude to in your response ref Should the above happen then any QOS1/2 guarantees are really moot. The client has no way of knowing (or finding out) whether the message it published was received (the server will have effectively thrown away all info regarding the client). As such the choice to resend does not seem unreasonable (generally a duplicate message is better than no message; QOS2 will be broken either way due to the removal of the session state).
Interesting the V5 spec has something to say on this (it's more strict than the section of the V3 spec you quoted):
It does not look like this client currently adheres to this rule (the call to
You can actually handle this yourself if you want in the Anyway sorry about misunderstanding initially; I hope I've provided info relevant to your question this time! This is certainly something that could be addressed but I'm not sure that its a priority (and think there is a way of achieving what you want without changes to the library). |
Sounds promising. Let's stress test the idea. Are you able to tell me:
No. A simple re-deploy of the the broker can commonly trigger it. If the out-going broker instance doesn't wait to complete all QOS2 messages, then anything "in-flight" is going to get lost. Remember that QOS2 requires 4 messages to complete, so this is kinda likely if the broker shuts down too quickly.... ... depending on the broker, the new incoming broker won't be handed any session state information. So all clients just get told "session found 0".
No. That's the really dangerous assumption I'm getting at. You are assuming that duplication is safe for a message flagged as "exactly once". The client has explictly marked the message as "do not duplicate", accepting a performance hit for flagging it such. So assuming that duplication is okay and pragmatic seems very wrong indeed. Please see the post office scandle in the UK for explanation of why silently duplicating records can be bad. While I'm not suggesting anyone use MQTT for financial records, my point is that Paho has absolutely no way to know what damage it's doing by duplicating a message that was supposed to be "Exactly once". You are right that we are discussing an unrecoverable situation. So the better thing to do, much better thing to do, is tell the application and let it figure out what to do next. |
If your broker does not retain session information when re-deployed then you run the risk of message loss ("Exactly once delivery" is not achievable in this situation).
Unfortunately ease of use dictates that some decisions be made for the user (a lot of library users have minimal knowledge of the protocol and just expect the library to work). MQTT appears, at first glance, to be very simple however there are a surprising number of edge cases and libraries get used in ways the authors never considered. Anyway I believe it's possible to achieve your goal using I'll remove the "More info needed" flag but please do consider raising this as a new issue that includes elements of this discussion (due to the number of old issues I'm not sure how many people will review this one). |
I've finally found time to evaluate the I still think current behaviour is, very bluntly, wrong!
This is no edge case, it's a core feature described in the abstract of the standard.
Silently corrypting a user's data is not "ease of use". Sometimes things are hard no matter your design, and the right thing to do is put guard rails with well sign posted error messages so that users don't shoot themselves in the foot.
Oof. I can't agree with that. Maybe in earlier 3.x but not in 5. The standard is centered on some very simple core concepts which don't have much overlap to cause curious edge cases. However it is very easy to misread the protocol standard and orient a client API around some subtle misunderstanding. This then makes the API have surprising behaviour to the user. Case in point, the standard has a lot devoted to reconnection and maintaining a session state accross reconnects ensuring it's consistent. Yet Paho's design seems to ignore a lot of the founational understanding of this. Eg: it makes very little sense for paho to have built in auto-reconnect while expecting the user to directly set |
There will be numerious things in this library that are "wrong" because it's evolved over years (with numerous authors all having different needs) and, as we saw with the V2 release, change is hard!. So we need to be pragmatic here in terms of finding a solution that meets your needs (unless you have the time to write a new version that addresses the failings; this is the approach we attempted with the Go Client and I still have not got it to v1.0 :-) ). Pull requests are welcome but do need to be mindful of breaking users existing code.
This is actually a good example; many users don't really care about "exactly once" delivery, but do want to receive messages that have come in since their app was restarted (so Anyway I think it's best to focus on your specific issue. My thought was that in |
Yup, I know. You'll note my earlier comments back in Feb. You'll note my earlier comments
What I meant by this was that when I first filed the issue, it was because I was willing to file some PRs myself, but needed to understand the chance of them getting approved before I sunk 2 weeks of business hours into fixing them. Getting PRs dismissed because a maintainer isn't willing to have a particular problem fixed is a really frustraiting way to burn time. The problem isn't the time to write them, it's understanding that there is any willingness to accept such changes at the end of the process. It's the oppertunity for some kind of steer on what form the changes should take. Sadly nobody responded at all prior to 2.x being released and so I'm guessing that getting potentially breaking changes through are far less likely to be approved.
Indeed it is a good example. What you say there reinforces my reason for not filing any PRs yet: What's needed are a bunch of smaller changes some breaking a use case with others fixing it again from a different angle. I'll close this now in favour of #855 and seek to get some time to offer PRs towards that. |
Excellent - thanks for the new issue. One suggestion - perhaps note at the top that you are prepared to work on this and are looking to maximise the chance of a PR being accepted (I'm not sure if anyone else is currently in a position to take on a project like this, it will be a fair amount of work). |
I'm hunting for an issue explaining this in the documentation (found here), but I've not found one:
Was this a deliberate design decision or is it still up for someone to offer a "fix"?
I'm concerned because even though it might be possible to work around
clean_session=False
simply by creating a new client, I don't see any work around for Session Present = False 3.2.2.2 Session Present.That is if the MQTT broker loses the session for any reason, including administrative action, the client will re-send duplicate QOS 2 messages and the broker will have no mechanism to de-duplicate them. I don't see a way for code using Paho to handle Session Present = False itself to prevent duplicate QOS 2 messages.
The text was updated successfully, but these errors were encountered: