Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow host to start before NATS #22

Closed
autodidaddict opened this issue Jun 3, 2021 · 5 comments
Closed

Allow host to start before NATS #22

autodidaddict opened this issue Jun 3, 2021 · 5 comments
Labels
enhancement New feature or request

Comments

@autodidaddict
Copy link
Member

In an effort to keep things as simple and easy and error-free as possible, we'd like to be able to start the host even if NATS isn't running. The host would essentially sit in an idle, non-functional, "not ready" state until it detects that NATS is available. Once NATS is available, it'll connect, make all its subscriptions, enable functionality, and put itself in the "ready" state.

@autodidaddict
Copy link
Member Author

SITREP: After spending basically the entire morning pairing on this, we discovered a number of things. The first is that it's really tricky and difficult and actually quite un-idiomatic to hold up/slow down/block the supervision hierarchy startup sequence in order to establish a connection to a NATS broker (or database or whatever).

Secondly, after consulting with someone who actually knows what they're doing, we came across the ConsumerSupervisor, which looks like it's a way to manage/maintain subscriptions in a connection-independent manner. This should, in theory, let us maintain our subscriptions across client restarts (network partition events).

There are also some issues with the code and complexity in the Gnat client that we may want to file as issues with the NATS client team.

@autodidaddict
Copy link
Member Author

Since we have two connections (one for RPC and one for the control interface), we could create 2 Gnat.Servers that handle the messages they need to handle from those topics, managed by a single ConsumerSupervisor.

We're currently putting this issue near the bottom of the backlog so we can get more mission-critical functionality out of the way. This will give us some time to stew on the most idiomatic (and hopefully simple) solution to this problem.

@QuinnWilton
Copy link

Based on what we chatted about, I think the ConsummerSupervisor is likely what you want. Note that you'll also need to use the ConnectionSupervisor in your supervision tree, in order to manage the connection.

@QuinnWilton
Copy link

QuinnWilton commented Jun 22, 2021

For what it's worth, I noticed this issue in their tracker, so it sounds like they're aware of some of the issues surrounding the connection handling not being super idiomatic. A lot of the suggestions and concerns brought up by others in that thread align with the issues I pointed out to you.

For a good reference on my suggestions around starting processes in a "disconnected" state, take a look at the linked timestamp in this talk: https://youtu.be/lg7M0h9eoug?t=985

The core idea is that processes that depend on external systems should avoid connecting to those systems during initialization, because they can't guarantee the existence or reachability of those systems. Instead, they should be implemented as a state machine with two states: :connected, and :disconnected, and the process should attempt to transition to the :connected state after initialization completes. If the connection fails, the process can leverage backoff + jitter to re-attempt the connection later.

In the case of the GNat process, I'd likely rework the server to use such a state machine, and to also track a list of the currently active subscriptions, alongside monitor references to the consumers behind those subscriptions. Now, if those consumers go down, the subscriptions can be cleaned up. Similarly, whenever the connection fails and is eventually re-established, those subscriptions can be re-initialized with the broker.

@autodidaddict
Copy link
Member Author

This has been fixed for quite some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants