Fix/analytics connection hang #141

marwan37 · 2024-02-08T17:58:25Z

Describe changes

I implemented a workaround that disables retries for sending analytics data, in response to issue #130. This ensures the CLI remains responsive even when the Segment analytics API is unreachable, due to network failures or blocking mechanisms like PiHole.

Updated src/mlstacks/analytics/client.py to set max_retries to 0, which can be adjusted as needed.
Wrapped the analytics_context.track call in a try/except block within the track_event function. This ensures the return False statement is reachable for error handling and logging exceptions.

Reference to Documentation

The Segment Python library documentation did not explicitly mention configurable options for max_retries, but the source code revealed a max_retries property in segment/analytics/client.py.

Testing

To simulate an unreachable analytics API domain at api.segment.io, I redirected all requests to this domain to 127.0.0.1 by adding the following entry in the /etc/hosts file on my Mac:

127.0.0.1 api.segment.io

Following this setup, I tested the deploy, breakdown, output, and destroy CLI commands, and confirmed it exits immediately, without hanging, after attempting to reach the analytics API domain once.

Pre-requisites

Please ensure you have done the following:

I have read the CONTRIBUTING.md document.
If my change requires a change to docs, I have updated the documentation
accordingly.
I have added tests to cover my changes.
I have based my new branch on develop and the open PR is targeting
develop. If your branch wasn't based on develop read
Contribution guide on rebasing branch to develop.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to
change)
Other (add details above)

Summary by CodeRabbit

New Features
- Introduced a new configuration option max_retries for enhanced control over analytics tracking.
Refactor
- Improved the reliability of event tracking in the analytics module with better error handling.

…es to 0

coderabbitai · 2024-02-08T17:58:46Z

Important

Auto Review Skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository.

To trigger a single review, invoke the @coderabbitai review command.

Walkthrough

This update introduces a new configuration parameter max_retries to the analytics module, initially set to 0. It also enhances the track_event function with error handling to manage exceptions during analytics tracking, aiming to improve reliability and user experience by preventing the application from hanging due to unreachable analytics services.

Changes

File	Summary
`.../analytics/client.py`	Added `max_retries` config parameter; updated `track_event` with try-except for exceptions

Related issues

Failed connections to analytics API block CLI indefinitely #130: This PR potentially addresses the issue by introducing error handling in analytics tracking, which could prevent the CLI from hanging when the analytics service is unreachable. The addition of max_retries may also relate to the objectives of improving reliability and user experience by ensuring core operations are not affected by analytics failures.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit-tests for this file.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit tests for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository from git and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit tests.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
The JSON schema for the configuration file is available here.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

strickvl · 2024-02-08T19:18:54Z

Thank you for this PR! One thing you'll probably need to do is rebase on develop branch. If your branch is branched off develop already, then you can just click the 'edit' button at the right / top of the header and then update it so that it knows it'll be merged onto develop and not to main (which it's currently set to do).

Thanks!

strickvl · 2024-02-08T19:20:07Z

You can also see the CI is showing a linting error, so please do run the formatting and linting scripts locally before updating with the fix.

marwan37 · 2024-02-08T19:54:24Z

Just fixed any issues mentioned in the CI and ran the format and lint scripts again. This was indeed branched off develop already so the edit button at the top allowed me to update it. Thanks!

strickvl · 2024-02-08T19:56:44Z

@coderabbitai review

coderabbitai

Review Status

Actionable comments generated: 2

Configuration used: .coderabbit.yaml

Commits

Files that changed from the base of the PR and between 992fff1 and 256be0d.

Files selected for processing (1)

src/mlstacks/analytics/client.py (2 hunks)

src/mlstacks/analytics/client.py

strickvl

Looks mostly all good to me. Only thing I'd suggest is making it a debug log as there's no need to disrupt the user experience over this.

src/mlstacks/analytics/client.py

Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>

marwan37 · 2024-02-08T20:05:50Z

Great suggestion, @strickvl. I've committed the change. Thanks for your input. Regarding the other CodeRabbit suggestion, I'm assuming it's safe to ignore as the app uses a single instance of the analytics client?

strickvl · 2024-02-08T20:07:12Z

@marwan37 yep sometimes the rabbit has good suggestions or catches small things. Today, not so much :) You can ignore them.

strickvl

Nice work! This looks good to go from my side. I'll let my colleague @bcdurak give it a review as he originally worked on some of this analytics code.

strickvl · 2024-02-08T20:10:58Z

@marwan37 sorry the line I suggested seems to have been too long. I'd break up that logger statement and then I'll rerun the CI.

marwan37 · 2024-02-08T20:17:40Z

@strickvl, I've adjusted the logger statement. Thanks for pointing that out.

bcdurak

I left some comments on the changes. Feel free to ask any questions if anything is unclear.

bcdurak · 2024-02-09T15:23:30Z

src/mlstacks/analytics/client.py

@@ -34,6 +34,7 @@
 logger = getLogger(__name__)

 analytics.write_key = "tU9BJvF05TgC29xgiXuKF7CuYP0zhgnx"
+analytics.max_retries = 0


Lowering this number makes a lot of sense. Considering the timeout is set to 15 seconds by default, in case of a timeout, it should never take that long. Perhaps 3 retries or 5 might be more suitable here.

Hello @bcdurak, I appreciate the review and the insights. It's worth noting that the CLI hung for ~10 minutes before exiting with the default settings, likely due to exponential backoff.

I just tested it with max_tries set to lower values, and here's what I got:
3: 5 seconds
5: 20 seconds
6: 40 seconds

Setting it to 5 sounds like a balanced approach, as you suggested. I'll make the change.

bcdurak · 2024-02-09T15:42:40Z

src/mlstacks/analytics/client.py

    """
    if metadata is None:
        metadata = {}

    metadata.setdefault("event_success", True)

    with MLStacksAnalyticsContext() as analytics_context:
-        return bool(analytics_context.track(event=event, properties=metadata))
+        try:


This is where it gets tricky a bit.

The MLStacksAnalyticsContext is implemented as a context manager. Any Exception that would happen during the execution of things within the scope of this context manager will be handled by its __exit__ method. This way, we ensure that actual execution can not fail, even if something goes wrong in the analytics. So this try-except is already covered by it.

You may have seen some error messages already though. The reason is, by default, the segment analytics python package is using something called a Consumer, which creates separate threads to upload and send out the events. Due to its nature, any calls happening within this thread are out of the scope of our context manager. However, if something goes wrong, they handle it the same way with a try-catch and give out an error log right here, that you may have seen already.

I see two solutions if you would like to get rid of the error logs:

You can disable the usage of this consumer paradigm by setting the analytics.sync_mode to True. This way, the events will be sent out by the main thread, and the MLStacksAnalyticsContext will do all the error handling. However, in the case of an unresponsive server, this will block the main thread for analytics.max_retries+1 * analytics.timeout seconds, so it is not very ideal.

You can try to disable the logger for the segment analytics package and implement a custom analytics.on_error method to handle the same error message as a debug message.

Personally, I would recommend the second solution.

Thank you for the thorough explanation about the MLStacksAnalyticsContext and Segment's consumer threads behavior. These issues were noted in the original GitHub ticket, which led me to initially consider a threading-based solution to address the main problem. However, as you mentioned, they are outside the scope of mlstacks' analytics client.

Given the nuances you've outlined, I agree that the second solution seems more appropriate. I'll work on implementing the analytics.on_error method, with non-disruptive error handling.

I've implemented the custom on_error handler as recommended, including corresponding unit tests to verify its functionality. If there are any further adjustments or tests you think should be included, please let me know.

Edit: To resolve the test suite failures during the CI, I've revised the test to generate a unique write_key per session, and reverted to the original key and configurations post-tests.

Edit 2: Circling back, it turns out the initial fix using a unique write_key didn't quite tackle the problem. I simplified the test to directly examine log outputs and removed the use of mocking. Thanks for bearing with me.

…lures

…ications from mocking

tests/unit/analytics/__init__.py

tests/unit/analytics/test_custom_on_error.py

Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>

…t.py

strickvl · 2024-02-13T15:44:42Z

@marwan37 I cloned your fork + checked out your branch but when running this I still get the following output when testing locally:

i.e. it doesn't seem to pick up your custom error at all. Looking at their codebase, it's not clear to me whether the on_error is doing what we think it's doing actually... @bcdurak any thoughts? This e.g. is the place where the log I'm getting is generated.

marwan37 · 2024-02-13T16:11:03Z

@strickvl, yes I had those errors too. My apologies, I should have documented that behavior earlier. These errors seem to be handled internally by Segment and won't trigger the custom on_error. Currently, it should be assumed that the on_error handler is designed primarily for handling errors that occur after the retry logic has concluded. Without directly modifying the Segment library, I couldn't find a way to gain more control over those logs, but I'd be happy to explore further if needed.

strickvl

Just a couple changes that will fix this so it works the way we want. We can disable the segment logger itself (using the suggested code) and then we don't see the logged output at all (the stuff we can't control). We tested on our end that the on_error is indeed triggering / outputting what we want when logging verbosity is set to DEBUG.

src/mlstacks/analytics/client.py

strickvl · 2024-02-14T13:57:08Z

@marwan37 can you run the format / linting script on the files on your end and push any changes that get made?

bcdurak

@marwan37 The changes look good. Thank you so much for your contribution 😃

marwan37 added 2 commits February 8, 2024 11:27

Fix unreachable return in analytics tracking by handling exceptions

0467b95

Fix unreachable analytics domain connection hang by setting max_retri…

c9847f1

…es to 0

strickvl requested review from strickvl and safoinme February 8, 2024 19:16

strickvl added the bug Something isn't working label Feb 8, 2024

Fix linting issues regarding missing return statement

256be0d

marwan37 changed the base branch from main to develop February 8, 2024 19:49

strickvl requested review from bcdurak and removed request for safoinme February 8, 2024 19:55

coderabbitai bot reviewed Feb 8, 2024

View reviewed changes

src/mlstacks/analytics/client.py Outdated Show resolved Hide resolved

src/mlstacks/analytics/client.py Outdated Show resolved Hide resolved

strickvl requested changes Feb 8, 2024

View reviewed changes

src/mlstacks/analytics/client.py Outdated Show resolved Hide resolved

Update src/mlstacks/analytics/client.py

97effee

Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>

strickvl approved these changes Feb 8, 2024

View reviewed changes

Break up debug statement and re-format/lint file

2693b79

strickvl self-requested a review February 9, 2024 12:10

bcdurak requested changes Feb 9, 2024

View reviewed changes

marwan37 added 3 commits February 9, 2024 10:26

Adjust max_retries to 5 for optimized CLI exit times

90dec73

Add custom on_error handler for segment and unit tests

e7c366a

Improve custom on_error handler tests in analytics

4f0ea76

strickvl requested a review from bcdurak February 9, 2024 20:47

Generate unique write_key per test to ensure isolation and fix CI fai…

a55d6f8

…lures

marwan37 added 2 commits February 9, 2024 16:34

run scripts to format and lint test file

0cedcf2

Refactor test for custom on_error handler to eliminate possible compl…

4b94d31

…ications from mocking

strickvl reviewed Feb 12, 2024

View reviewed changes

tests/unit/analytics/__init__.py Outdated Show resolved Hide resolved

tests/unit/analytics/test_custom_on_error.py Show resolved Hide resolved

marwan37 and others added 2 commits February 12, 2024 08:50

Update tests/unit/analytics/test_custom_on_error.py

1e5a750

Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>

Add Apache 2.0 License header and remove testing write_key from clien…

4766b7f

…t.py

strickvl self-requested a review February 13, 2024 15:45

strickvl requested changes Feb 14, 2024

View reviewed changes

src/mlstacks/analytics/client.py Outdated Show resolved Hide resolved

src/mlstacks/analytics/client.py Outdated Show resolved Hide resolved

strickvl approved these changes Feb 14, 2024

View reviewed changes

marwan37 force-pushed the fix/analytics-connection-hang branch 2 times, most recently from d37cda0 to 4766b7f Compare February 14, 2024 14:07

Disable segment logger and apply formatting/linting

89d1420

bcdurak approved these changes Feb 15, 2024

View reviewed changes

strickvl merged commit b64ccd1 into zenml-io:develop Feb 15, 2024
34 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/analytics connection hang #141

Fix/analytics connection hang #141

marwan37 commented Feb 8, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 8, 2024 •

edited

Loading

Auto Review Skipped

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (`.coderabbit.yaml`)

CodeRabbit Discord Community

strickvl commented Feb 8, 2024

strickvl commented Feb 8, 2024

marwan37 commented Feb 8, 2024 •

edited

Loading

strickvl commented Feb 8, 2024

coderabbitai bot left a comment

strickvl left a comment

marwan37 commented Feb 8, 2024

strickvl commented Feb 8, 2024 •

edited

Loading

strickvl left a comment

strickvl commented Feb 8, 2024

marwan37 commented Feb 8, 2024

bcdurak left a comment

bcdurak Feb 9, 2024

marwan37 Feb 9, 2024

bcdurak Feb 9, 2024

marwan37 Feb 9, 2024

marwan37 Feb 9, 2024 •

edited

Loading

strickvl commented Feb 13, 2024

marwan37 commented Feb 13, 2024

strickvl left a comment

strickvl commented Feb 14, 2024

bcdurak left a comment

Fix/analytics connection hang #141

Fix/analytics connection hang #141

Conversation

marwan37 commented Feb 8, 2024 • edited by coderabbitai bot Loading

Describe changes

Reference to Documentation

Testing

Pre-requisites

Types of changes

Summary by CodeRabbit

coderabbitai bot commented Feb 8, 2024 • edited Loading

Auto Review Skipped

Walkthrough

Changes

Related issues

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

CodeRabbit Discord Community

strickvl commented Feb 8, 2024

strickvl commented Feb 8, 2024

marwan37 commented Feb 8, 2024 • edited Loading

strickvl commented Feb 8, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

strickvl left a comment

Choose a reason for hiding this comment

marwan37 commented Feb 8, 2024

strickvl commented Feb 8, 2024 • edited Loading

strickvl left a comment

Choose a reason for hiding this comment

strickvl commented Feb 8, 2024

marwan37 commented Feb 8, 2024

bcdurak left a comment

Choose a reason for hiding this comment

bcdurak Feb 9, 2024

Choose a reason for hiding this comment

marwan37 Feb 9, 2024

Choose a reason for hiding this comment

bcdurak Feb 9, 2024

Choose a reason for hiding this comment

marwan37 Feb 9, 2024

Choose a reason for hiding this comment

marwan37 Feb 9, 2024 • edited Loading

Choose a reason for hiding this comment

strickvl commented Feb 13, 2024

marwan37 commented Feb 13, 2024

strickvl left a comment

Choose a reason for hiding this comment

strickvl commented Feb 14, 2024

bcdurak left a comment

Choose a reason for hiding this comment

marwan37 commented Feb 8, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 8, 2024 •

edited

Loading

CodeRabbit Configration File (`.coderabbit.yaml`)

marwan37 commented Feb 8, 2024 •

edited

Loading

strickvl commented Feb 8, 2024 •

edited

Loading

marwan37 Feb 9, 2024 •

edited

Loading