Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] vm-dhcp-controller dhcpserver-agent Pod Fails to Start in Air-Gapped Environment Due to ImagePullPolicy 'Always' Setting #6942

Open
zha0jf opened this issue Nov 5, 2024 · 8 comments
Assignees
Labels
area/vm-dhcp-controller kind/bug Issues that are defects reported by users or that we know have reached a real release reproduce/needed Reminder to add a reproduce label and to remove this one severity/needed Reminder to add a severity label and to remove this one
Milestone

Comments

@zha0jf
Copy link

zha0jf commented Nov 5, 2024

Describe the bug
In the prepareAgentPod function within the file vm-dhcp-controller/pkg/controller/ippool/common.go, an init container is added to the vm-dhcp-agent pod, using the image docker.io/library/busybox with the default imagePullPolicy set to Always. This configuration causes the vm-dhcp-agent's dhcpserver-agent pod to get stuck in the “Init:ImagePullBackOff” status when deploying in an air-gapped environment, even if the busybox:latest image was pre-uploaded offline. This prevents the vm-dhcp-agent from functioning in air-gapped environments.

To Reproduce
Steps to reproduce the behavior:

  1. Deploy vm-dhcp-controller in an air-gapped environment.
  2. Even after uploading the busybox:latest image offline, the vm-dhcp-agent dhcpserver-agent pod remains stuck in the "Init:ImagePullBackOff" status.

Expected behavior
The vm-dhcp-agent dhcpserver-agent pod should be able to pull the locally uploaded image and start successfully in an air-gapped environment.

Support bundle

Environment

  • Harvester ISO version: harvester-v1.4.0-rc4-amd64.iso

Additional context
It is recommended to add the configuration ImagePullPolicy = "IfNotPresent" at line 96 in the file vm-dhcp-controller/pkg/controller/ippool/common.go to ensure that the vm-dhcp-controller can function properly in air-gapped environments.

@zha0jf zha0jf added kind/bug Issues that are defects reported by users or that we know have reached a real release reproduce/needed Reminder to add a reproduce label and to remove this one severity/needed Reminder to add a severity label and to remove this one labels Nov 5, 2024
@starbops
Copy link
Member

starbops commented Nov 6, 2024

Hi @zha0jf, thanks for reporting. I wonder why, even though the missing container image is imported to the node, it still complains about Init:ImagePullBackOff. Did you wait enough time to surpass the back-off time? Another simple way to test it is to simply delete the DHCP agent pod entirely.

@starbops
Copy link
Member

starbops commented Nov 6, 2024

imagePullPolicy defaults to Always because we specify the busybox image without a tag. As the document explains, kubelet will check if the digest exists in the cache. This involves communication between kubelet and remote registries. That might be why you constantly get an Init:ImagePullBackOff error even though the container image is manually imported.

Your fix in harvester/vm-dhcp-controller#37 can resolve the issue. However, this also reminds us that we should use any container image packed in the ISO image. I'd say that's the root cause of the problem. I'm considering adding iproute2 to the rancher/harvester-vm-dhcp-agent image and making the init container use it. We'll take care of that in following PRs. Thank you.

@zha0jf
Copy link
Author

zha0jf commented Nov 6, 2024

That will be great. Thank you.

@harvesterhci-io-github-bot
Copy link

harvesterhci-io-github-bot commented Nov 7, 2024

Pre Ready-For-Testing Checklist

  • If NOT labeled: not-require/test-plan Has the e2e test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue?

    • The automation skeleton PR is at:
    • The automation test case PR is at:
  • If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility?
    The compatibility issue is filed at:

@harvesterhci-io-github-bot

Automation e2e test issue: harvester/tests#1652

@starbops
Copy link
Member

Hi @zha0jf, We've published harvester-vm-dhcp-controller 0.3.3. Would you like to try it to see if it solves the problem?

https://github.com/harvester/experimental-addons/blob/main/harvester-vm-dhcp-controller/harvester-vm-dhcp-controller.yaml

Thank you.

@zha0jf
Copy link
Author

zha0jf commented Nov 11, 2024

Hi @starbops,I just tested harvester-vm-dhcp-controller 0.3.3 in the harvester-v1.4.0-rc5 air-gapped environment, and the issue has been resolved. Thank you.

@starbops
Copy link
Member

@zha0jf That's a great news! Thanks again for spotting the issue and sending a patch to us :)

Note: please don't close this issue yet. We have our pipelines and will take care of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/vm-dhcp-controller kind/bug Issues that are defects reported by users or that we know have reached a real release reproduce/needed Reminder to add a reproduce label and to remove this one severity/needed Reminder to add a severity label and to remove this one
Projects
Status: Resolved/Scheduled
Development

No branches or pull requests

4 participants