-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nsq: DRAINING mode #1302
Comments
SGTM, I suspect this one is going to be a bit tricky.
My gut tells me that, as a first pass, trying an implementation that waits for all topics/channels to be empty and then exits will likely avoid the "premature client reconnect after close" problem.
A little confused by this — my understanding is that this proposal isn't intended to modify the existence of topics/channels, so my answer would be topics/channels should remain present if an
IMO yes, we must provide a mechanism to force a (clean) shutdown. Maybe even offer a timeout? Minor:
|
Also love that this and #1300 are labeled |
I should comment that i don't yet have a perfectly clear idea of the implementation for this; it will be a chore!
Perhaps there is a case for both? My intention is to targeting a use case where a nsqd is going away (i.e. removed from rotation), and by the time it's done draining - there is nothing left on that nsqd instance. From that context i'm leaning towards "delete" functionality where topics disappear as they are drained. If you are trying to remove a nsqd instance from rotation where that nsqd instance had 10 different topics, but just one or two with notable backlogs, it would be desirable to have the topics that drain quickly deleted. Deleting promotes a better cluster hygiene where you don't have a nsqd instance which is no longer getting messages on a topic still getting consumer connections where it causes RDY to be spread thin. (i.e. think a topic that takes a day to drain in some odd circumstance.) I've used the word "drain" because i think it's best, but what i really mean is the process of removing a nsqd from rotation.
I had
agreed. ideas? |
Got it. In that case, simplest way might be to to proactively send a tombstone to
🤷 might make sense at the top-level? |
I think we are on the same page; you wouldn't toombstone until the actual removal so i don't think that affects clients draining. Currently the TCP protocol for lookupd doesn't support toombstone, but that would be easy to resolve if needed. It might also not be critical if nsqd rejects the creation of new topics when it's draining. That would inhibit new subscriptions after a topic is deleted.
👍 I think i have enough feedback here to start on an implementation; then we can move to discussion the tradeoffs of a concrete implementation. |
To facilitate running
nsqd
in environments where the host isn't long running and to facilitate operations around managing a cluster a new "draining" mode will be introduced tonsqd
.A
nsqd
instance in "draining" mode will:/info
endpointClients that use a HA approach of pooling multiple nsqds for publishing messages (i.e. nsqio/go-nsq#311 ) are expected to transparently tolerate a host in draining mode.
Implementation Plan
A new
--sigterm=drain
CLI flag will enable this new behavior. Existing functionality will be preserved with the argument--sigterm=clean-shutdown
A new
PUT /config/drain
endpoint can also initiate a drain, and aPUT /config/shutdown
would initiate a clean shutdown.When in a draining mode new messages will be rejected with an error.
E_PUB_FAILED
will be the response for new messages over the TCP protocol, and HTTP 503 for http protocol.An attempt to create new topics and channels (via subscribe) will be rejected if
nsqd
is in drain mode.Once initiated a drain operation can only be completed, it can't be canceled. TBD:
PUT /config/shutdown
may be able to override the drain and close all connections and exitnsqd
.Open Questions
POST /topic/delete
)cc: #1254
Closes #1022
The text was updated successfully, but these errors were encountered: