Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProducerPool - easier remote publishing #311

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jehiah
Copy link
Member

@jehiah jehiah commented Nov 20, 2020

go-nsq does not provide good primitives to handle failures when publishing to a nsq instances; Handling publish errors is left as an exercise for the developer when that logic is important for reliable message creation. Common guidance has been to prefer publishing to a colocated nsqd instance, but that configuration is less desirable in many cloud environments.

I'm proposing we add a ProducerPool which provides a Publish interface that can be configured with multiple nsqd instances and which will retry publishing to another instance on error. This will provide a simple way for applications publishing messages to handle write failures in a fault tolerant way.

Based on outcome of nsqio/nsq#1300 this producer may also lazily organize nsqd instances into groupings of node local, zone local, region local or global and provide prioritization base on topology.

Functionally this is similar to the reliable publishing in nsq_to_nsq where publish happens in a round robin, or host pool to a pool of nsqd instances (but without the same backpressure).

cc: nsqio/nsq#1254 which tracks facilitating use of nsq in a cloud environment.

@mreiferson
Copy link
Member

Sounds good 👍

@jehiah
Copy link
Member Author

jehiah commented Dec 11, 2020

Thanks to @jharshman for contributing a PublishAsync implementation. I still want to work on making this transparently pick the best nsqd to write to.

@ploxiln
Copy link
Member

ploxiln commented Dec 15, 2020

another possibly related idea: "publish to N different nsqd" (e.g. 2) for users that want stronger guarantees that no messages are ever lost

@mreiferson
Copy link
Member

Not sure if this is ready for review, but a few things as is:

  • Looks like the target nsqd are static in a ProducerPool, meaning if the list needs to change, a user is expected to discard a ProducerPool instance and create a new one with a new list (and new connections). If so, then ProducerPool needs some sort of Close() func to cleanup.
  • PublishAsync shouldn't be managing its own goroutine directly, perhaps a ProducerPool instance should manage the lifecycle of a goroutine that handles retries?

@jehiah
Copy link
Member Author

jehiah commented Dec 29, 2020

Not sure if this is ready for review, but a few things as is:

No, not RFR (see below) - but I think it has some useful contents and figuring out the right way to expose initialization needs some thought so definitely RFC, so comments are good.

Based on outcome of nsqio/nsq#1300 this producer may also lazily organize nsqd instances into groupings of node local, zone local, region local or global and provide prioritization base on topology.


PublishAsync shouldn't be managing its own goroutine directly, perhaps a ProducerPool instance should manage the lifecycle of a goroutine that handles retries?

I don't like how this implementation spawns goroutines, but i don't really like needing to mange a long lived goroutine either; This implementation should be "correct" so that's a start. I'd love this targeted an interface (I have a version in use that does) that would make it easier to land testing, but this package needs some re-work to get to a point of targeting an interface - sort of out of scope of this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants