What determines when a replication job is considered 'crashing'? #3550
-
Hello, I have a question regarding transitioning an active continuous running replication job from state 'running' -> 'crashing'. I have the following scenario: Replication works fine. However; let's assume that the network is disconnected between A and B (for instance, I pull the network cable). 'Host A' continues to show a 'running' state in /_scheduler/jobs/_replicator for anywhere from 2.5 minutes to 10 minutes, until the job finally transitions to 'crashing'. I have configured the local.ini with the following: [replicator] Despite 'errors' in the couchdb log file due to 'req_timedout', the job takes several minutes to report that it's 'crashing'. How can I make the _scheduler/jobs/_replicator report a 'crashing' state sooner? Say within 1 minute? Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
With remote connections unless there is a periodic ping or timeout involved, the socket might not know that the cable was pulled. If the documents have all replicated, for example, we'd only find out if the connection is broken when the I think you meant Also |
Beta Was this translation helpful? Give feedback.
With remote connections unless there is a periodic ping or timeout involved, the socket might not know that the cable was pulled. If the documents have all replicated, for example, we'd only find out if the connection is broken when the
_changes
feed times out. Thetimeout
on the changes feed will be derived from theconnection_timeout
config parameter and since you set it to 10000 (10 seconds) so it seems you should find earlier than a minute. Good idea to lowerretries_per_request
too.I think you meant
_scheduler/docs/_replicator
? Maybe monitor the logs and see when you start seeing errors in the log and if you poll_scheduler/jobs
or_scheduler/docs
when you start seeing the first sta…