"No DB shards could be opened" occurring after few minutes of usage in a new cluster #3124

tudordumitriu · 2020-09-02T05:04:58Z

tudordumitriu
Sep 2, 2020

Description

We have a kubernetes couchdb 3 nodes cluster deployed into azure aks.
There are around 10 dbs with an average of 100 docs per database
After deploying a fresh cluster using the existing databases (using the standard docker image https://github.com/apache/couchdb-docker) 3.1.0 and re-initializing the cluster (attached file couch.cluster.config.sh) after few minutes of usage from various services / clients we are getting the "No DB shards could be opened" error.

General mentions:
On a local dev k8s cluster the same setup works perfectly (same images/process)
We did try some of the settings suggested in #796 (except the last one /etc/systemd/system/couchdb.d/override.conf but in a built docker couch image is harder to do that and checked the ulimit -n and seems pretty high)
We are overriding local.ini on prestart with the attached file
local.ini.txt

Judging from the couch cluster error logs doesn't seem to be a resource problem but a auth issue (the logs are attached)
The general k8s cluster CPU seems high but not used by the couch machines

Steps to Reproduce

Deploy new k8s cluster in azure aks
Configure the cluster using script
Generic API query / Fauxton admin usage (that crashes too with Database unavailable but after refresh appears ok which leads to the fact that the cluster nodes were not synced properly I think)

Expected Behaviour

Your Environment

Azure AKS 3 nodes cluster

CouchDB version used: 3.1.0
Browser name and version: Not relevant
Operating system and version: Azure

Additional Context (logs, scripts and config files)

couch2.log
couch0.log
couch1.log
local.ini.txt
couch.cluster.config.sh.txt

Answered by wohali

Sep 2, 2020

[error] 2020-09-02T13:55:28.729275Z couchdb@couchdb-2.couch-service <0.354.0> -------- Could not open file ./data/shards/00000000-7fffffff/_users.1599052283.couch: no such file or directory
[error] 2020-09-02T13:55:33.805208Z couchdb@couchdb-2.couch-service <0.506.0> -------- couch_multidb_changes : Known change feed <0.524.0> died :: {{badmatch,{not_found,no_db_file}},[{couch_multidb_changes,changes_reader,3,[{file,[115,114,99,47,99,111,117,99,104,95,109,117,108,116,105,100,98,95,99,104,97,110,103,101,115,46,101,114,108]},{line,235}]}]}
[error] 2020-09-02T13:55:33.805285Z couchdb@couchdb-2.couch-service emulator -------- Error in process <0.524.0> on node 'couchdb@couchdb-2.couch-service…

View full answer

wohali · 2020-09-02T05:40:59Z

wohali
Sep 2, 2020
Collaborator

I'm not a k8s expert, but your errors indicate that the nodes could not reach each other:

[error] 2020-09-02T04:12:22.821638Z couchdb@couchdb-0.couch-service emulator -------- Error in process <0.279.0> on node 'couchdb@couchdb-0.couch-service' with exit value:
{{rexi_DOWN,{'couchdb@couchdb-2.couch-service',noconnect}},[{mem3_rpc,rexi_call,3,[{file,"src/mem3_rpc.erl"},{line,394}]},{mem3_seeds,'-start_replication/1-fun-0-',1,[{file,"src/mem3_seeds.erl"},{line,99}]}]}

noconnect is pretty clear. Everything just cascaded after that.

You may want to try using the CouchDB helm chart instead of your current setup.

1 reply

tudordumitriu Sep 2, 2020
Author

@wohali Appreciate the quick response but I don't think that's the cause
Indeed I should be using the helm chart you mentioned but there's not a big difference (debatable ofc).
But, I started from scratch (erased all dbs and replicated after initialization back the strictly necessary ones), commented out the seeds_list in local.ini and as you can see no more noconnect but we are still getting the same "No DB Shards error"
couch0.log
couch1.log
couch2.log

We are actually getting it from Pouch (as well as from all clients connecting to CouchDB) when trying watch changes
ttp://URL:5984/userdb-7432/_changes?style=all_docs&feed=longpoll&heartbeat=10000&since=126-g1AAAABreJzLYWBgYMxgTmGQT84vTc5ISXKA0rpGemCWbnFqUVlmcmoOUCFTIkMeC8N_IMjKYE6sywUKsRtbmpkZJCcSNiALAPj3ItQ&limit=94
and the response is {"error":"internal_server_error","reason":"No DB shards could be opened.","ref":3763885668}

Checking the ref id in logs I can see
[error] 2020-09-02T14:19:03.942048Z couchdb@couchdb-0.couch-service <0.24593.0> 7f7ecf8146 req_err(3763885668) internal_server_error : No DB shards could be opened.
[<<"fabric_util:get_shard/4 L111">>,<<"fabric_util:get_shard/4 L128">>,<<"fabric_util:get_shard/4 L128">>,<<"fabric:get_security/2 L183">>,<<"chttpd_auth_request:db_authorization_check/1 L110">>,<<"chttpd_auth_request:authorize_request/1 L19">>,<<"chttpd:handle_req_after_auth/2 L320">>,<<"chttpd:process_request/1 L305">>]

Also, I think the Bad security object in <<"central">> might be misleading because this is a completely different db that Pouch cannot connect to.

Any hints much appreciated so that I can carry on the investigation at least.
Thanks again

wohali · 2020-09-02T17:23:04Z

wohali
Sep 2, 2020
Collaborator

[error] 2020-09-02T13:55:28.729275Z couchdb@couchdb-2.couch-service <0.354.0> -------- Could not open file ./data/shards/00000000-7fffffff/_users.1599052283.couch: no such file or directory
[error] 2020-09-02T13:55:33.805208Z couchdb@couchdb-2.couch-service <0.506.0> -------- couch_multidb_changes : Known change feed <0.524.0> died :: {{badmatch,{not_found,no_db_file}},[{couch_multidb_changes,changes_reader,3,[{file,[115,114,99,47,99,111,117,99,104,95,109,117,108,116,105,100,98,95,99,104,97,110,103,101,115,46,101,114,108]},{line,235}]}]}
[error] 2020-09-02T13:55:33.805285Z couchdb@couchdb-2.couch-service emulator -------- Error in process <0.524.0> on node 'couchdb@couchdb-2.couch-service' with exit value:
{{badmatch,{not_found,no_db_file}},[{couch_multidb_changes,changes_reader,3,[{file,"src/couch_multidb_changes.erl"},{line,235}]}]}

Sounds like your storage layer isn't getting set up correctly, or is dropping online, or isn't available fast enough, or has ongoing issues.

Given it runs locally the problem is definitely something going on in AKS, and this probably not a CouchDB problem.

1 reply

tudordumitriu Sep 7, 2020
Author

Thank you
Indeed I can confirm that was the problem (the storage that we used wasn't fast enough for couchdb: Azure/AKS#223 and we had to switch to premium / ssd)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"No DB shards could be opened" occurring after few minutes of usage in a new cluster #3124

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

"No DB shards could be opened" occurring after few minutes of usage in a new cluster #3124

tudordumitriu Sep 2, 2020

Description

Steps to Reproduce

Expected Behaviour

Your Environment

Additional Context (logs, scripts and config files)

Replies: 2 comments · 2 replies

wohali Sep 2, 2020 Collaborator

tudordumitriu Sep 2, 2020 Author

wohali Sep 2, 2020 Collaborator

tudordumitriu Sep 7, 2020 Author

tudordumitriu
Sep 2, 2020

Replies: 2 comments 2 replies

wohali
Sep 2, 2020
Collaborator

tudordumitriu Sep 2, 2020
Author

wohali
Sep 2, 2020
Collaborator

tudordumitriu Sep 7, 2020
Author