-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notification: Broken Jetpack connection banner doesn't dismiss #90758
Comments
Maybe it's not a false positive. See: https://github.com/Automattic/dotcom-forge/issues/7234#issuecomment-2116409760 |
We definitely run into actual broken connections to fix, too, and it would be good to address the issues behind those. But there are also quite a few cases where the user will see connection errors even when a debug reveals the connection is fine. If the user tries with another browser or an incognito window with these cases, it works, so in these cases it's something stuck in the browser itself. Logging out and back in seems to fix it, but I'm not sure folks would think to try that in their troubleshooting steps, so the error can be frustrating for them. |
Related? #79324 |
We're seeing a similar case on A4A sites, but the root cause shouldn't be the same. Maybe the reason why it doesn't clear the error message is the same bug. |
Thanks @paulopmt1 - we just ran into that today in our WooCommerce tinkering at a meetup; same situation! And yes the error is stuck. That must be what's causing us to get new users with this error when everything seems fine. |
Thanks for sharing one more example of that @supernovia.
I also noticed that everything was working as expected. |
Until now, Luis and I have been working on fixing the A4A site creation issue, which is similar to this issue but it's not the same. Investigating this issue further, I found two interesting things:
Note: Our current cache clear CTA on the /hosting page won't clear that backend cache, so it will always be 5 minutes. So even when we (or the Jetpack by itself) fix the connection, it can take up to 7 minutes for the user to notice that which is not ideal and can lead to the issues we see here. We could change it to 1 minute if we update our backend to only cache a failed Jetpack connection for 1 minute and a success Jetpack connection for 5 minutes (as it is currently). In that case, our frontend would have a ~15s cache only for failed Jetpack connections (so multiple requests would benefit from that) and 5 minutes for successful connections (as it's currently). |
Nice, after this HE interaction (p1716493834606609/1716465093.319339-slack-CB0B2G43X), I found a way to simulate the bug:
Next step:
|
Found the minimum steps to reproduce the bug:
Screen.Recording.2024-05-24.at.11.10.32.mov |
Why does this flow trigger the Jetpack connection validation?The "Pages" menu isn't the only one that triggers a failure. In fact, any page that tries to load fetchModuleList will trigger it since that call fails and calls the setJetpackConnectionMaybeUnhealthy. This bug is not so usual because fewer places in the Calypso call it. Why the fetchModuleList call fails? Because it doesn't support simple sites (which is the state of our site on that flow) and will always return Solution for this trigger: We could validate if the current site is_atomic before calling that jetpack-blogs API. We wouldn't fix the root cause of the issue but will avoid one important trigger of it. The root cause questionWhy don't we always have a We introduced the We update this option in a couple of places and have a dedicated endpoint that does that: fbhepr%2Skers%2Sjcpbz%2Sjc%2Qpbagrag%2Serfg%2Qncv%2Qcyhtvaf%2Sraqcbvagf%2Swrgcnpx%2Qnpgvir%2Qpbaarpgrq%2Qcyhtvaf.cuc%3Se%3Qqo5rp541%2310-og Ask more about it here: p1716657490806279-slack-CBG1CP4EN |
Isn't the site an Atomic site by then? Since you triggered the transfer from the hosting page, the primary URL was changed to a
cc'ing @Automattic/jetpack-vulcan on this, so they can look at the flow when this is triggered and the option populated. |
We recently changed the trigger for updating the Full Context: p9o2xV-46w-p2#comment-9261 Since there's no Please give us a ping if you need further assistance/clarifications/reviews related to the above! |
Support References This comment is automatically generated. Please do not edit it.
|
Not yet, since the user navigates on it before going Atomic, at that moment (second 20 of the video), the navigation is in the simple site.
I see, so this is the change we did on its behavior.
Learn that we need to set In this diff we're releasing the feature to all new Atomic sites: D150120-code Here's the code in action: Screen.Recording.2024-05-27.at.11.38.43.mov |
We deployed the fix for this problem. So, new sites won't be affected anymore. |
Reading through the p2 post:
Did we also consider just clearing the error message cache once an HE fixes the connection via the Jetpack Debugger? Preferably, when an HE fixes the connection issue, they should then immediately be able to see the connection error message resolve. |
That makes sense, Eric. Since it's a low priority now, I'll return to this task by the middle of the following week and see how to address the final |
We have a fix for ensuring an Atomic site will always have the |
@fgiannar fixed the last piece of syncing jetpack_connection_active_plugins and it's working as expected: p9dueE-8gv-p2#comment-10464 We'll now address Eric's suggestion as the final step. |
The final piece was delivered here: D163816-code |
Quick summary
See p1715718239728809-slack-C029GN3KD
The summary of that thread is that some set of users see a notification that describes the Jetpack connection as being broken because the plugin is deactivated. But, even after fixing the connection, the notice doesn't dismiss.
In working with an HE on this issue, we fixed the issue by clearing IndexedDB and localStorage from the application tab of Chrome, after we noticed that the issue only showed for the HE in Chrome and not in Safari.
In talking to @supernovia, she suggests that somewhere around 2-3% of her interactions are this and that they ask users to log out and back in.
Steps to reproduce
Based on the reports, I would imagine that this is due to an intermittent connection issue that then persists. Based on that, I'm not sure what the repro steps are. This is how I would start though.
/_cli
for an atomic site and remove the blog or user connection secretWhat you expected to happen
The notice to disappear after the connection is broken.
What actually happened
The notice persists and requires HE intervention and the user logging out.
Impact
Some (< 50%)
Available workarounds?
Yes, easy to implement
Platform (Simple and/or Atomic)
No response
Logs or notes
No response
The text was updated successfully, but these errors were encountered: