Durable queue down or inaccessible after two out of three nodes are forcefully shut down #2454
-
we are seeing a lot of durable queues are not working after force reboot pod, logs look like below
evaluate on the different nodes shows root@cve-rabbitmq-0:/# rabbitmqctl eval 'rabbit_misc:dirty_read({rabbit_queue, rabbit_misc:r(<<"nova">>, queue, <<"compute">>)}).'
{error,not_found} root@cve-rabbitmq-2:/# rabbitmqctl eval 'rabbit_misc:dirty_read({rabbit_queue, rabbit_misc:r(<<"nova">>, queue, <<"compute">>)}).'
{ok,{amqqueue,{resource,<<"nova">>,queue,<<"compute">>},
true,false,none,
[{<<"x-ha-policy">>,longstr,<<"all">>}],
<10614.7269.0>,[],[],[],
[{vhost,<<"nova">>},
{name,<<"ha_ttl_nova">>},
{pattern,<<"^(?!(amq\\.|reply_)).*">>},
{'apply-to',<<"all">>},
{definition,[{<<"ha-mode">>,<<"all">>},
{<<"ha-sync-mode">>,<<"automatic">>},
{<<"message-ttl">>,70000}]},
{priority,0}],
undefined,
[{<10614.7270.0>,<10614.7269.0>}],
[],live,0,[],<<"nova">>,
#{user => <<"nova">>}}} rabbit_queue table are not synced. There are around 1k queues & exchanges, the way I am using to reproducing problem is to first # kubectl delete pod -n openstack cve-rabbitmq-1 --force --grace-period 0
# sleep 8
# kubectl delete pod -n openstack cve-rabbitmq-0 --force --grace-period 0 The cluster status yields OK on both nodes Cluster status of node rabbit@cve-rabbitmq-2.cve-rabbitmq-discovery.openstack.svc.cluster.local ...
[{nodes,
[{disc,
['rabbit@cve-rabbitmq-0.cve-rabbitmq-discovery.openstack.svc.cluster.local',
'rabbit@cve-rabbitmq-1.cve-rabbitmq-discovery.openstack.svc.cluster.local',
'rabbit@cve-rabbitmq-2.cve-rabbitmq-discovery.openstack.svc.cluster.local']}]},
{running_nodes,
['rabbit@cve-rabbitmq-0.cve-rabbitmq-discovery.openstack.svc.cluster.local',
'rabbit@cve-rabbitmq-1.cve-rabbitmq-discovery.openstack.svc.cluster.local',
'rabbit@cve-rabbitmq-2.cve-rabbitmq-discovery.openstack.svc.cluster.local']},
{cluster_name,
<<"rabbit@cve-rabbitmq-0.cve-rabbitmq-discovery.openstack.svc.cluster.local">>},
{partitions,[]},
{alarms,
[{'rabbit@cve-rabbitmq-0.cve-rabbitmq-discovery.openstack.svc.cluster.local',
[]},
{'rabbit@cve-rabbitmq-1.cve-rabbitmq-discovery.openstack.svc.cluster.local',
[]},
{'rabbit@cve-rabbitmq-2.cve-rabbitmq-discovery.openstack.svc.cluster.local',
[]}]}] root@cve-rabbitmq-0:/# rabbitmqctl cluster_status
Cluster status of node rabbit@cve-rabbitmq-0.cve-rabbitmq-discovery.openstack.svc.cluster.local ...
[{nodes,
[{disc,
['rabbit@cve-rabbitmq-0.cve-rabbitmq-discovery.openstack.svc.cluster.local',
'rabbit@cve-rabbitmq-1.cve-rabbitmq-discovery.openstack.svc.cluster.local',
'rabbit@cve-rabbitmq-2.cve-rabbitmq-discovery.openstack.svc.cluster.local']}]},
{running_nodes,
['rabbit@cve-rabbitmq-2.cve-rabbitmq-discovery.openstack.svc.cluster.local',
'rabbit@cve-rabbitmq-1.cve-rabbitmq-discovery.openstack.svc.cluster.local',
'rabbit@cve-rabbitmq-0.cve-rabbitmq-discovery.openstack.svc.cluster.local']},
{cluster_name,
<<"rabbit@cve-rabbitmq-0.cve-rabbitmq-discovery.openstack.svc.cluster.local">>},
{partitions,[]},
{alarms,
[{'rabbit@cve-rabbitmq-2.cve-rabbitmq-discovery.openstack.svc.cluster.local',
[]},
{'rabbit@cve-rabbitmq-1.cve-rabbitmq-discovery.openstack.svc.cluster.local',
[]},
{'rabbit@cve-rabbitmq-0.cve-rabbitmq-discovery.openstack.svc.cluster.local',
[]}]}] We are using rabbitmq 3.7.26 and OTP 22.3.4.4 And for clearance the policies are listed below root@mgt01:~/rabbitmq_cmd# ./rabbitmqadmin -N test list policies
+----------+-----------------+----------+---------------------------------------------------------------------------------------------+-----------------------+----------+
| vhost | name | apply-to | definition | pattern | priority |
+----------+-----------------+----------+---------------------------------------------------------------------------------------------+-----------------------+----------+
| / | ha-all | all | {"ha-sync-mode": "automatic", "ha-mode": "all"} | .* | 0 |
| cinder | ha_ttl_cinder | all | {"ha-sync-mode": "automatic", "ha-mode": "all", "message-ttl": 70000} | ^(?!(amq\.|reply_)).* | 0 |
| glance | ha_ttl_glance | all | {"ha-sync-mode": "automatic", "ha-mode": "all", "message-ttl": 70000} | ^(?!(amq\.|reply_)).* | 0 |
| heat | ha_ttl_heat | all | {"ha-sync-mode": "automatic", "ha-mode": "all", "queue-mode": "lazy", "message-ttl": 70000} | ^(?!(amq\.|reply_)).* | 0 |
| ironic | ha_ttl_ironic | all | {"ha-sync-mode": "automatic", "ha-mode": "all", "queue-mode": "lazy", "message-ttl": 70000} | ^(?!(amq\.|reply_)).* | 0 |
| keystone | ha_ttl_keystone | all | {"ha-sync-mode": "automatic", "ha-mode": "all", "message-ttl": 70000} | ^(?!(amq\.|reply_)).* | 0 |
| masakari | ha_ttl_masakari | all | {"ha-sync-mode": "automatic", "ha-mode": "all", "message-ttl": 70000} | ^(?!(amq\.|reply_)).* | 0 |
| neutron | ha_ttl_neutron | all | {"ha-sync-mode": "automatic", "ha-mode": "all", "message-ttl": 70000} | ^(?!(amq\.|reply_)).* | 0 |
| nova | ha_ttl_nova | all | {"ha-sync-mode": "automatic", "ha-mode": "all", "message-ttl": 70000} | ^(?!(amq\.|reply_)).* | 0 |
| watcher | ha_ttl_watcher | all | {"ha-sync-mode": "automatic", "ha-mode": "all", "message-ttl": 70000} | ^(?!(amq\.|reply_)).* | 0 |
+----------+-----------------+----------+---------------------------------------------------------------------------------------------+-----------------------+----------+ |
Beta Was this translation helpful? Give feedback.
Replies: 9 comments 16 replies
-
We don't know how many replicas there are in total but two of them are force-deleted. The loss of a majority of replicas for classic mirrored queues is covered in a separate section.
is a pretty specific message covered in a dedicated section in the classic queue mirroring guide. As far as the node that's handling your Nova connection is, the queue The direct table row lookup uses a dirty (node-local) lookup for a different queue named With OpenStack in the past, we have seen clients connecting and immediately starting to perform operations on nodes that are not yet 100% booted and synchronized. In part, this is because client operations are concurrent to everything else the node may be doing, and in part because client connection listeners were started in the middle of the boot sequence. In 3.8, this has been addressed as part of a group of changes to address #2384 (#2406). RabbitMQ 3.7 is three days away from going entirely out of support. Consider upgrading to 3.8 introduced a new replicated queue type which is focusses on data safety and predictable recovery. However, when the majority of replicas are offline, which very well may be the case in this example, |
Beta Was this translation helpful? Give feedback.
-
First thank you for reply. But I am sorryI cannot agree with your opinion, |
Beta Was this translation helpful? Give feedback.
-
The question is if there is a difference in |
Beta Was this translation helpful? Give feedback.
-
@jshen28 can full logs from all nodes be shared from the start of your test to the end of it? Without the logs we won't be able to offer a meaningful theory. We also would investigate in detail only if this can be reproduced with |
Beta Was this translation helpful? Give feedback.
-
Thank you for reply. Unfortunately, I do not keep the log. Using the procedure, unroutable publishes could be consistently reproduced but internal behavior is different. Following log is error from a different test run. After some debugging, I found root@mgt01:~# kubectl -n openstack exec -it cve-rabbitmq-0 -c rabbitmq-ha -- rabbitmqctl eval 'length(mnesia:dirty_all_keys(rabbit_topic_trie_edge)).'
12
root@mgt01:~# kubectl -n openstack exec -it cve-rabbitmq-2 -c rabbitmq-ha -- rabbitmqctl eval 'length(mnesia:dirty_all_keys(rabbit_topic_trie_edge)).'
720
root@mgt01:~# kubectl -n openstack exec -it cve-rabbitmq-1 -c rabbitmq-ha -- rabbitmqctl eval 'length(mnesia:dirty_all_keys(rabbit_topic_trie_edge)).'
19 while cluster is still fine root@mgt01:~# kubectl -n openstack exec -it cve-rabbitmq-1 -c rabbitmq-ha -- rabbitmqctl cluster_status
Cluster status of node rabbit@cve-rabbitmq-1.cve-rabbitmq-discovery.openstack.svc.cluster.local ...
[{nodes,
[{disc,
['rabbit@cve-rabbitmq-0.cve-rabbitmq-discovery.openstack.svc.cluster.local',
'rabbit@cve-rabbitmq-1.cve-rabbitmq-discovery.openstack.svc.cluster.local',
'rabbit@cve-rabbitmq-2.cve-rabbitmq-discovery.openstack.svc.cluster.local']}]},
{running_nodes,
['rabbit@cve-rabbitmq-0.cve-rabbitmq-discovery.openstack.svc.cluster.local',
'rabbit@cve-rabbitmq-2.cve-rabbitmq-discovery.openstack.svc.cluster.local',
'rabbit@cve-rabbitmq-1.cve-rabbitmq-discovery.openstack.svc.cluster.local']},
{cluster_name,
<<"rabbit@cve-rabbitmq-0.cve-rabbitmq-discovery.openstack.svc.cluster.local">>},
{partitions,[]},
{alarms,
[{'rabbit@cve-rabbitmq-0.cve-rabbitmq-discovery.openstack.svc.cluster.local',
[]},
{'rabbit@cve-rabbitmq-2.cve-rabbitmq-discovery.openstack.svc.cluster.local',
[]},
{'rabbit@cve-rabbitmq-1.cve-rabbitmq-discovery.openstack.svc.cluster.local',
[]}]}]
root@mgt01:~# kubectl -n openstack exec -it cve-rabbitmq-2 -c rabbitmq-ha -- rabbitmqctl cluster_status
Cluster status of node rabbit@cve-rabbitmq-2.cve-rabbitmq-discovery.openstack.svc.cluster.local ...
[{nodes,
[{disc,
['rabbit@cve-rabbitmq-0.cve-rabbitmq-discovery.openstack.svc.cluster.local',
'rabbit@cve-rabbitmq-1.cve-rabbitmq-discovery.openstack.svc.cluster.local',
'rabbit@cve-rabbitmq-2.cve-rabbitmq-discovery.openstack.svc.cluster.local']}]},
{running_nodes,
['rabbit@cve-rabbitmq-0.cve-rabbitmq-discovery.openstack.svc.cluster.local',
'rabbit@cve-rabbitmq-1.cve-rabbitmq-discovery.openstack.svc.cluster.local',
'rabbit@cve-rabbitmq-2.cve-rabbitmq-discovery.openstack.svc.cluster.local']},
{cluster_name,
<<"rabbit@cve-rabbitmq-0.cve-rabbitmq-discovery.openstack.svc.cluster.local">>},
{partitions,[]},
{alarms,
[{'rabbit@cve-rabbitmq-0.cve-rabbitmq-discovery.openstack.svc.cluster.local',
[]},
{'rabbit@cve-rabbitmq-1.cve-rabbitmq-discovery.openstack.svc.cluster.local',
[]},
{'rabbit@cve-rabbitmq-2.cve-rabbitmq-discovery.openstack.svc.cluster.local',
[]}]}]
root@mgt01:~# kubectl -n openstack exec -it cve-rabbitmq-0 -c rabbitmq-ha -- rabbitmqctl cluster_status
Cluster status of node rabbit@cve-rabbitmq-0.cve-rabbitmq-discovery.openstack.svc.cluster.local ...
[{nodes,
[{disc,
['rabbit@cve-rabbitmq-0.cve-rabbitmq-discovery.openstack.svc.cluster.local',
'rabbit@cve-rabbitmq-1.cve-rabbitmq-discovery.openstack.svc.cluster.local',
'rabbit@cve-rabbitmq-2.cve-rabbitmq-discovery.openstack.svc.cluster.local']}]},
{running_nodes,
['rabbit@cve-rabbitmq-2.cve-rabbitmq-discovery.openstack.svc.cluster.local',
'rabbit@cve-rabbitmq-1.cve-rabbitmq-discovery.openstack.svc.cluster.local',
'rabbit@cve-rabbitmq-0.cve-rabbitmq-discovery.openstack.svc.cluster.local']},
{cluster_name,
<<"rabbit@cve-rabbitmq-0.cve-rabbitmq-discovery.openstack.svc.cluster.local">>},
{partitions,[]},
{alarms,
[{'rabbit@cve-rabbitmq-2.cve-rabbitmq-discovery.openstack.svc.cluster.local',
[]},
{'rabbit@cve-rabbitmq-1.cve-rabbitmq-discovery.openstack.svc.cluster.local',
[]},
{'rabbit@cve-rabbitmq-0.cve-rabbitmq-discovery.openstack.svc.cluster.local',
[]}]}] Test script looks like this kubectl delete pod -n openstack --force --grace-period 0 cve-rabbitmq-1
while true; do
ret=`kubectl get pods -n openstack -owide | grep cve-rabbitmq-1 | grep -i Running`
if $? -eq 0; then
sleep .05
kubectl delete pod -n openstack --force --grace-period 0 cve-rabbitmq-0
break
fi
done |
Beta Was this translation helpful? Give feedback.
-
Besides,in kuberntes delete pod will change container IP, will it cause some unexpected behavior? |
Beta Was this translation helpful? Give feedback.
-
So while a node is starting up, and I kill the pod immediately, should it receive a nodedown signal from net_kernel? Currently the net tick time is default. And force shutdown with grace period of 0 could potentially startup the pod in a very short period of which could be less than 1/2 of tick time, will it also cause some issues? |
Beta Was this translation helpful? Give feedback.
-
I am closing this as resolved since this test
There are doc links in this discussion that explain all the important bits about how nodes rejoin a cluster and what that means specifically on Kubernetes. |
Beta Was this translation helpful? Give feedback.
-
Adding extra insights in response to: https://www.youtube.com/watch?v=I02oKJlOnR4&lc=Ugyei_s5o0sVFfKce-t4AaABAg The important ones:
The less important ones:
In conclusion, there is a limit to how much free support we can provide. Considering how much @michaelklishin helped with this, and the extra insights provided above, if you feel @jshen28 that you need more, I would recommend looking at https://www.rabbitmq.com/services.html Thank you! |
Beta Was this translation helpful? Give feedback.
I am closing this as resolved since this test