"No DB shards could be opened" occurring after few minutes of usage in a new cluster #3124
Answered
by
wohali
tudordumitriu
asked this question in
Q&A
Replies: 2 comments 2 replies
-
I'm not a k8s expert, but your errors indicate that the nodes could not reach each other:
You may want to try using the CouchDB helm chart instead of your current setup. |
Beta Was this translation helpful? Give feedback.
1 reply
-
Sounds like your storage layer isn't getting set up correctly, or is dropping online, or isn't available fast enough, or has ongoing issues. Given it runs locally the problem is definitely something going on in AKS, and this probably not a CouchDB problem. |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
tudordumitriu
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Description
We have a kubernetes couchdb 3 nodes cluster deployed into azure aks.
There are around 10 dbs with an average of 100 docs per database
After deploying a fresh cluster using the existing databases (using the standard docker image https://github.com/apache/couchdb-docker) 3.1.0 and re-initializing the cluster (attached file couch.cluster.config.sh) after few minutes of usage from various services / clients we are getting the "No DB shards could be opened" error.
General mentions:
On a local dev k8s cluster the same setup works perfectly (same images/process)
We did try some of the settings suggested in #796 (except the last one /etc/systemd/system/couchdb.d/override.conf but in a built docker couch image is harder to do that and checked the ulimit -n and seems pretty high)
We are overriding local.ini on prestart with the attached file
local.ini.txt
Judging from the couch cluster error logs doesn't seem to be a resource problem but a auth issue (the logs are attached)
The general k8s cluster CPU seems high but not used by the couch machines
Steps to Reproduce
Deploy new k8s cluster in azure aks
Configure the cluster using script
Generic API query / Fauxton admin usage (that crashes too with Database unavailable but after refresh appears ok which leads to the fact that the cluster nodes were not synced properly I think)
Expected Behaviour
Your Environment
Azure AKS 3 nodes cluster
Additional Context (logs, scripts and config files)
couch2.log
couch0.log
couch1.log
local.ini.txt
couch.cluster.config.sh.txt
Beta Was this translation helpful? Give feedback.
All reactions