-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/analytics connection hang #141
Fix/analytics connection hang #141
Conversation
Important Auto Review SkippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the To trigger a single review, invoke the WalkthroughThis update introduces a new configuration parameter Changes
Related issues
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
Thank you for this PR! One thing you'll probably need to do is rebase on Thanks! |
You can also see the CI is showing a linting error, so please do run the formatting and linting scripts locally before updating with the fix. |
Just fixed any issues mentioned in the CI and ran the format and lint scripts again. This was indeed branched off |
@coderabbitai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly all good to me. Only thing I'd suggest is making it a debug log as there's no need to disrupt the user experience over this.
Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>
Great suggestion, @strickvl. I've committed the change. Thanks for your input. Regarding the other CodeRabbit suggestion, I'm assuming it's safe to ignore as the app uses a single instance of the analytics client? |
@marwan37 yep sometimes the rabbit has good suggestions or catches small things. Today, not so much :) You can ignore them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! This looks good to go from my side. I'll let my colleague @bcdurak give it a review as he originally worked on some of this analytics code.
@marwan37 sorry the line I suggested seems to have been too long. I'd break up that logger statement and then I'll rerun the CI. |
@strickvl, I've adjusted the logger statement. Thanks for pointing that out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments on the changes. Feel free to ask any questions if anything is unclear.
src/mlstacks/analytics/client.py
Outdated
@@ -34,6 +34,7 @@ | |||
logger = getLogger(__name__) | |||
|
|||
analytics.write_key = "tU9BJvF05TgC29xgiXuKF7CuYP0zhgnx" | |||
analytics.max_retries = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lowering this number makes a lot of sense. Considering the timeout is set to 15 seconds by default, in case of a timeout, it should never take that long. Perhaps 3 retries or 5 might be more suitable here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @bcdurak, I appreciate the review and the insights. It's worth noting that the CLI hung for ~10 minutes before exiting with the default settings, likely due to exponential backoff.
I just tested it with max_tries
set to lower values, and here's what I got:
3
: 5 seconds
5
: 20 seconds
6
: 40 seconds
Setting it to 5 sounds like a balanced approach, as you suggested. I'll make the change.
src/mlstacks/analytics/client.py
Outdated
""" | ||
if metadata is None: | ||
metadata = {} | ||
|
||
metadata.setdefault("event_success", True) | ||
|
||
with MLStacksAnalyticsContext() as analytics_context: | ||
return bool(analytics_context.track(event=event, properties=metadata)) | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where it gets tricky a bit.
The MLStacksAnalyticsContext
is implemented as a context manager. Any Exception
that would happen during the execution of things within the scope of this context manager will be handled by its __exit__
method. This way, we ensure that actual execution can not fail, even if something goes wrong in the analytics. So this try-except
is already covered by it.
You may have seen some error messages already though. The reason is, by default, the segment analytics python package is using something called a Consumer
, which creates separate threads to upload and send out the events. Due to its nature, any calls happening within this thread are out of the scope of our context manager. However, if something goes wrong, they handle it the same way with a try-catch
and give out an error log right here, that you may have seen already.
I see two solutions if you would like to get rid of the error logs:
- You can disable the usage of this consumer paradigm by setting the
analytics.sync_mode
to True. This way, the events will be sent out by the main thread, and theMLStacksAnalyticsContext
will do all the error handling. However, in the case of an unresponsive server, this will block the main thread foranalytics.max_retries+1
*analytics.timeout
seconds, so it is not very ideal. - You can try to disable the logger for the segment analytics package and implement a custom
analytics.on_error
method to handle the same error message as a debug message.
Personally, I would recommend the second solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the thorough explanation about the MLStacksAnalyticsContext
and Segment's consumer threads behavior. These issues were noted in the original GitHub ticket, which led me to initially consider a threading-based solution to address the main problem. However, as you mentioned, they are outside the scope of mlstacks' analytics client.
Given the nuances you've outlined, I agree that the second solution seems more appropriate. I'll work on implementing the analytics.on_error
method, with non-disruptive error handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've implemented the custom on_error
handler as recommended, including corresponding unit tests to verify its functionality. If there are any further adjustments or tests you think should be included, please let me know.
Edit: To resolve the test suite failures during the CI, I've revised the test to generate a unique write_key
per session, and reverted to the original key and configurations post-tests.
Edit 2: Circling back, it turns out the initial fix using a unique write_key
didn't quite tackle the problem. I simplified the test to directly examine log outputs and removed the use of mocking. Thanks for bearing with me.
Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>
@marwan37 I cloned your fork + checked out your branch but when running this I still get the following output when testing locally: i.e. it doesn't seem to pick up your custom error at all. Looking at their codebase, it's not clear to me whether the |
@strickvl, yes I had those errors too. My apologies, I should have documented that behavior earlier. These errors seem to be handled internally by Segment and won't trigger the custom |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple changes that will fix this so it works the way we want. We can disable the segment logger itself (using the suggested code) and then we don't see the logged output at all (the stuff we can't control). We tested on our end that the on_error
is indeed triggering / outputting what we want when logging verbosity is set to DEBUG.
@marwan37 can you run the format / linting script on the files on your end and push any changes that get made? |
d37cda0
to
4766b7f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marwan37 The changes look good. Thank you so much for your contribution 😃
Describe changes
I implemented a workaround that disables retries for sending analytics data, in response to issue #130. This ensures the CLI remains responsive even when the Segment analytics API is unreachable, due to network failures or blocking mechanisms like PiHole.
src/mlstacks/analytics/client.py
to setmax_retries
to0
, which can be adjusted as needed.analytics_context.track
call in atry/except
block within thetrack_event
function. This ensures thereturn False
statement is reachable for error handling and logging exceptions.Reference to Documentation
The Segment Python library documentation did not explicitly mention configurable options for
max_retries
, but the source code revealed amax_retries
property insegment/analytics/client.py
.Testing
To simulate an unreachable analytics API domain at api.segment.io, I redirected all requests to this domain to 127.0.0.1 by adding the following entry in the /etc/hosts file on my Mac:
Following this setup, I tested the
deploy
,breakdown
,output
, anddestroy
CLI commands, and confirmed it exits immediately, without hanging, after attempting to reach the analytics API domain once.Pre-requisites
Please ensure you have done the following:
accordingly.
develop
and the open PR is targetingdevelop
. If your branch wasn't based on develop readContribution guide on rebasing branch to develop.
Types of changes
change)
Summary by CodeRabbit
max_retries
for enhanced control over analytics tracking.