[BUG] Deployment error #1176

rumart · 2024-02-26T15:48:51Z

Describe the bug
The VEBA deployment doesn't finish and throws an error when deploying the RabbitMQ cluster

To Reproduce
Steps to reproduce the behavior:
I've deployed the OVA as described in the docs
Waited for around 20 minutes, but none of the web endpoints work (Connection refused)

Expected behavior
The deployment to finish and the endpoints to work

Screenshots
Screenshot of bootstrap-debug.log

Version (please complete the following information):

VEBA Form Factor: Appliance
VEBA Version: v.0.8.0

Additional context
When troubleshooting I saw that the deployment stopped in what seems to be setup-05-knative.sh script.

I commented out scripts 1 through 4 in setup.sh and reran setup.sh

After a short while the script stopped with this message:

Checked the setup-05-knative.sh script and found that the VEBA_BOM_FILE variable was defined after it being used in the file

The ytt command on line 44 uses $VEBA_BOM_FILE, but the variable is first defined on line 51.

I moved that line above line 44 and reran setup.sh

Now the deployment could finish and I can access the web endpoints

github-actions · 2024-02-26T15:49:09Z

Howdy 🖐 rumart ! Thank you for your interest in this project. We value your feedback and will respond soon.

rumart · 2024-02-26T15:51:34Z

Here's a screenshot of kubectl get pods -A before re-running the setup file

rguske · 2024-02-26T16:05:22Z

Hi @rumart, the VEBA_BOM_FILE variable is already set in setup-04-kubernetes.sh for the first time - HERE.
I can see on your screenshot that the installation didn't finish successfully. The vmware-sources ns is e.g. missing. We've faced this issue before and actually, it should be fixed with #1170.
We have to dig into it.

rumart · 2024-02-26T16:10:04Z

Yeah, so when I comment out setup-04 it doesn't pick up on the BOM variable, but nevertheless, since it get's defined in setup-05 could it just be moved up a bit? Or should it be removed altogether?

Thanks for looking into it

rguske · 2024-02-26T16:56:05Z

I don't think that the issue is caused by not setting the VEBA_BOM_FILE variable. We have the suspicion that it is timing-related. Have you tried deploying it again? To what kind of environment are you deploying VEBA to?

rumart · 2024-02-26T17:11:17Z

I agree, the VEBA_BOM_FILE issue is because I've re-run the script without running the setup-04 which sets it the first time. Was more thinking of fixing that setup-05 file separately..

Anyways, I'm running it on a small home lab vSAN cluster. Have tried redeploying a few times, all stopping on the same error message.

I'll try to run it on a different env later tonight to see if that changes anything

rumart · 2024-02-26T20:01:18Z

I've tried on a single ESXi host not running anything else, storage on NVME. I've added more CPU and RAM to the appliance. Still errors out on the same step

I ssh'd to the appliance as soon as it was available and tailed the bootstrap-debug.log. The error failed calling webhook "defaulting.webhook.rabbitmq.eventing.knative.dev" happens after just a couple of minutes. As far as I understand there's a 10 minute timeout on most of the commands?

rguske · 2024-02-27T13:12:25Z

IIRC, the 10 minutes are the default for the kubectl wait command if you don't specify --timeout separately. I really wonder about this issue. I deployed it in my homelab (2-node vSAN cluster) as well and it worked like a charm. Anyway, like I said, William had this issue before as well but reordering the command executions did the trick. When I have time, I'll try to add another wait condition to the script(s)(if necessary!). Thanks @rumart

lamw · 2024-02-27T15:49:47Z

I suspect that the current "wait" conditions are actually passing, unless you login and it looks to be waiting for default 10m as mentioned by Robert. If it truly is a timing, we can always enhance the OVF properties to allow that to be customizable but I'm not sure if thats actually the case and we may need some other wait condition. If we can debug this further Robert, then we can spin up a custom build to verify for @rumart

jm66 · 2024-04-23T20:45:34Z

Just as @rumart, first error I got:

Error from server (InternalError): error when creating "/root/config/knative/rabbit.yaml": Internal error occurred: failed calling webhook "defaulting.webhook.rabbitmq.eventing.knative.dev": failed to call webhook: Post "https://rabbitmq-broker-webhook.knative-eventing.svc:443/defaulting?timeout=2s": dial tcp 10.109.26.244:443: connect: connection refused

Second try, I increased the timeout value and kept going.

Third try stumbled upon the following:

/root/setup/setup-05-knative.sh: line 44: VEBA_BOM_FILE: unbound variable

Which had to work around to keep the installation going.

rguske · 2024-06-12T08:51:58Z

@rumart I owe you a deep apology for not getting back to you earlier. Would you be open to troubleshoot your issue further? I've just added another wait condition to the setup-05-knative.sh script and have built a new appliance (test)version. I'd love to follow the deployment in your test-environment. Maybe we could run a Zoom session?
What really helps to get started is the following approach:

deploy a new VEBA instance but do not power it on
make a snapshot
open a terminal window and use a window multiplexer tool like e.g. tmux
create two terminal windows
power on VEBA and as far it has the IP configured, connect via ssh to it - on both windows!
run tail -f /var/log/bootstrap-debug.log on the one window and watch kubectl get pods -A on the other window
the made snapshot can be used to reset VEBA every time it is necessary - also to e.g. add a new command to one of the scripts (but you need to be very fast when adding a new command 😉 )

From there you can perfectly follow the progress.

The new build can be downloaded for testing purposes here: DOWNLOAD

rumart · 2024-06-13T06:27:48Z

Thanks @rguske. I've been busy with other things so haven't had the time myself.
I'm very interested in troubleshooting further and get this up and running.

rguske · 2024-06-13T06:58:16Z

Thanks @rguske. I've been busy with other things so haven't had the time myself. I'm very interested in troubleshooting further and get this up and running.

Sure, just let me know when you have the time and ping me on Discord or Slack (CNCF Workspace). Looking forward finding the rc.

rumart · 2024-06-13T07:15:47Z

Seems I cannot download the testversion..

…

On 13 Jun 2024, at 08:58, Robert Guske ***@***.***> wrote: Thanks @rguske <https://github.com/rguske>. I've been busy with other things so haven't had the time myself. I'm very interested in troubleshooting further and get this up and running. Sure, just let me know when you have the time and ping me on Discord or Slack (CNCF Workspace). Looking forward finding the rc. — Reply to this email directly, view it on GitHub <#1176 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADIR6R7QM6CG4N3SMZCO7HLZHE7J5AVCNFSM6AAAAABJF4UKS6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRUG4YTCMRQHA>. You are receiving this because you were mentioned.

rguske · 2024-06-13T09:49:18Z

Seems I cannot download the testversion..
…
On 13 Jun 2024, at 08:58, Robert Guske @.***> wrote: Thanks @rguske https://github.com/rguske. I've been busy with other things so haven't had the time myself. I'm very interested in troubleshooting further and get this up and running. Sure, just let me know when you have the time and ping me on Discord or Slack (CNCF Workspace). Looking forward finding the rc. — Reply to this email directly, view it on GitHub <#1176 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIR6R7QM6CG4N3SMZCO7HLZHE7J5AVCNFSM6AAAAABJF4UKS6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRUG4YTCMRQHA. You are receiving this because you were mentioned.

I've authorized you now 👍🏻

benwa · 2024-06-14T19:12:46Z

Just to add in, yesterday, we were on vCenter 7.0.3 and I was able to deploy. Today, after an update to vCenter 8.0.2, I get the same error as @rumart.

rabbitmqcluster.rabbitmq.com/veba-rabbit created
Error from server (InternalError): error when creating "/root/config/knative/rabbit.yaml": Internal error occurred: failed calling webhook "defaulting.webhook.rabbitmq.eventing.knative.dev": failed to call webhook: Post "https://rabbitmq-broker-webhook.knative-eventing.svc:443/defaulting?timeout=2s": dial tcp 10.98.98.40:443: connect: connection refused

rguske · 2024-06-17T08:23:48Z

Thanks a lot for your input @benwa. I don't think this issue is related to the vSphere version, since the first "real" interaction with the vCenter Server is at line 22 in script 06. when the VSphereSourcegets created. It really seems to be a timing issue. I still try to find out which component probably needs a dedicated wait condition.

benwa · 2024-06-25T17:30:46Z

Welp, I redownloaded the ova from the Flings site and ran a checksum. It was different. Redeployed and I'm all good now.

rumart · 2024-06-26T16:43:09Z

Eh… I still can’t deploy it. Even with a new test version provided by @rguskeOn 26 Jun 2024, at 18:29, William Lam ***@***.***> wrote: Closed #1176 as completed. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

rguske · 2024-06-28T09:44:11Z

Issue still exists.

rguske · 2024-06-28T19:32:56Z

@rumart I've now added a sleep 30 to setup-05-knative.sh. I haven't found the problematic part yet. Could you give this version a try? DOWNLOAD.

Thy

rumart · 2024-06-29T14:51:04Z

Now I'm able to deploy successfully. Tested several times without issues

rguske · 2024-06-30T10:18:46Z

Now I'm able to deploy successfully. Tested several times without issues

Interesting! Thanks lot for verifying Rudi. However, I will try to narrow it down. There must be different way.
We'd really appreciate if you'd be open to test further builds. Thy :)

royiversen78 · 2024-07-19T09:28:53Z

First time VEBA user eager to get this working, but I'm also experencing this issue
VEBA 0.8.0
vCenter 8.0.3

/var/log/bootstrap-debug.log

Error from server (InternalError): error when creating "/root/config/knative/rabbit.yaml": Internal error occurred: failed calling webhook "defaulting.webhook.rabbitmq.eventing.knative.dev": failed to call webhook: Post "https://rabbitmq-broker-webhook.knative-eventing.svc:443/defaulting?timeout=2s": dial tcp 10.105.248.31:443: connect: connection refused

rguske · 2024-07-19T15:59:04Z

First time VEBA user eager to get this working, but I'm also experencing this issue

VEBA 0.8.0

vCenter 8.0.3

/var/log/bootstrap-debug.log

Error from server (InternalError): error when creating "/root/config/knative/rabbit.yaml": Internal error occurred: failed calling webhook "defaulting.webhook.rabbitmq.eventing.knative.dev": failed to call webhook: Post "https://rabbitmq-broker-webhook.knative-eventing.svc:443/defaulting?timeout=2s": dial tcp 10.105.248.31:443: connect: connection refused

Thanks for reporting it. Could you please try the version provided in this comment HERE? Thy

royiversen78 · 2024-07-19T19:27:06Z

Thanks for reporting it. Could you please try the version provided in this comment HERE? Thy

That link doesn't work anymore. Google Drive says:

Sorry, the file you have requested does not exist.

Make sure that you have the correct URL and the file exists.

rguske · 2024-07-29T12:57:51Z

I will provide a new link in a bit. I was on vacation and back on the issue now. The issue looks similar to what is described here: https://cert-manager.io/docs/troubleshooting/webhook/

So, it looks to me that the Kubernetes API server is trying to call the rabbitmq-broker-webhook when we are installing the RabbitMQ cluster via kubectl apply -f ${RABBITMQ_CONFIG}.

Even tough, the following is included in our script which should ensure that everything is in READY state.

kubectl wait --for=condition=available deploy/rabbitmq-broker-webhook --timeout=${KUBECTL_WAIT} -n knative-eventing

rguske · 2024-07-30T07:37:45Z

@royiversen78 use this LINK temporarily.

royiversen78 · 2024-07-30T09:42:47Z

@royiversen78 use this LINK temporarily.

I'm getting the same issue with this version

Error from server (InternalError): error when creating "/root/config/knative/rabbit.yaml": Internal error occurred: failed calling webhook "defaulting.webhook.rabbitmq.eventing.knative.dev": failed to call webhook: Post "https://rabbitmq-broker-webhook.knative-eventing.svc:443/defaulting?timeout=2s": dial tcp 10.108.11.231:443: connect: connection refused

Add a pause to the 05-knative.sh as a workaround for #1176

rguske · 2024-10-27T19:47:22Z

@rumart @royiversen78 we added a pause to the installation to ensure service dependencies and availabilities.
Changes just got merged. #1268

If you'd like to test its functionality, please DM me (preferred on CNCF Slack) and I will provide you a download link to the OVA.
Thanks

rumart added the bug Something isn't working label Feb 26, 2024

lamw closed this as completed Jun 26, 2024

rguske reopened this Jun 28, 2024

rguske mentioned this issue Oct 23, 2024

Add a pause to the 05-knative.sh as a workaround for #1176 #1268

Merged

15 tasks

lamw added a commit that referenced this issue Oct 27, 2024

Merge pull request #1268 from rguske/issue-1176

f3ba886

Add a pause to the 05-knative.sh as a workaround for #1176

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Deployment error #1176

[BUG] Deployment error #1176

rumart commented Feb 26, 2024

github-actions bot commented Feb 26, 2024

rumart commented Feb 26, 2024

rguske commented Feb 26, 2024

rumart commented Feb 26, 2024

rguske commented Feb 26, 2024

rumart commented Feb 26, 2024

rumart commented Feb 26, 2024 •

edited

Loading

rguske commented Feb 27, 2024

lamw commented Feb 27, 2024

jm66 commented Apr 23, 2024

rguske commented Jun 12, 2024

rumart commented Jun 13, 2024

rguske commented Jun 13, 2024

rumart commented Jun 13, 2024 via email

rguske commented Jun 13, 2024

benwa commented Jun 14, 2024

rguske commented Jun 17, 2024

benwa commented Jun 25, 2024

rumart commented Jun 26, 2024 via email

rguske commented Jun 28, 2024

rguske commented Jun 28, 2024

rumart commented Jun 29, 2024

rguske commented Jun 30, 2024

royiversen78 commented Jul 19, 2024

rguske commented Jul 19, 2024

royiversen78 commented Jul 19, 2024

rguske commented Jul 29, 2024

rguske commented Jul 30, 2024

royiversen78 commented Jul 30, 2024

rguske commented Oct 27, 2024

[BUG] Deployment error #1176

[BUG] Deployment error #1176

Comments

rumart commented Feb 26, 2024

github-actions bot commented Feb 26, 2024

rumart commented Feb 26, 2024

rguske commented Feb 26, 2024

rumart commented Feb 26, 2024

rguske commented Feb 26, 2024

rumart commented Feb 26, 2024

rumart commented Feb 26, 2024 • edited Loading

rguske commented Feb 27, 2024

lamw commented Feb 27, 2024

jm66 commented Apr 23, 2024

rguske commented Jun 12, 2024

rumart commented Jun 13, 2024

rguske commented Jun 13, 2024

rumart commented Jun 13, 2024 via email

rguske commented Jun 13, 2024

benwa commented Jun 14, 2024

rguske commented Jun 17, 2024

benwa commented Jun 25, 2024

rumart commented Jun 26, 2024 via email

rguske commented Jun 28, 2024

rguske commented Jun 28, 2024

rumart commented Jun 29, 2024

rguske commented Jun 30, 2024

royiversen78 commented Jul 19, 2024

rguske commented Jul 19, 2024

royiversen78 commented Jul 19, 2024

rguske commented Jul 29, 2024

rguske commented Jul 30, 2024

royiversen78 commented Jul 30, 2024

rguske commented Oct 27, 2024

rumart commented Feb 26, 2024 •

edited

Loading