Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: recreate agent pods if agent image is updated #17

Merged
merged 1 commit into from
Feb 7, 2024

Conversation

starbops
Copy link
Member

@starbops starbops commented Jan 31, 2024

IMPORTANT: Please do not create a Pull Request without creating an issue first.

Problem:

Currently, upgrading the vm-dhcp-controller does not upgrade the agents simultaneously, resulting in inconsistent versions between the controller and the agent.

Solution:

The IPPool object will contain additional fields holding the information of the agent Pod. The controller will also propagate the image name changes to all IPPool objects and recreate the agent Pods according to the up-to-date image name to ensure the running agents match the agent Pod reference in their corresponding IPPool objects.

Related Issue:

harvester/harvester#5058

Test plan:

$ kubectl -n harvester-system get deploy harvester-vm-dhcp-controller -o jsonpath='{.spec.template.spec.containers[0].image}'
rancher/harvester-vm-dhcp-controller:main-head

$ kubectl -n harvester-system get pods default-net-48-agent -o jsonpath='{.spec.containers[0].image}'  
rancher/harvester-vm-dhcp-agent:main-head

Upgrade the vm-dhcp-controller chart:

helm upgrade --install harvester-vm-dhcp-controller ./chart -f values.yaml --namespace=harvester-system --create-namespace

After the upgrade, check if the agent Pods are using the newer image:

$ kubectl -n harvester-system get deploy harvester-vm-dhcp-controller -o jsonpath='{.spec.template.spec.containers[0].image}'
starbops/harvester-vm-dhcp-controller:agent-upgrade-head

$ kubectl -n harvester-system get pods default-net-48-agent -o jsonpath='{.spec.containers[0].image}'                       
starbops/harvester-vm-dhcp-agent:agent-upgrade-head

@@ -375,6 +377,10 @@ func (h *Handler) MonitorAgent(ipPool *networkv1.IPPool, status networkv1.IPPool
return status, err
}

if agentPod.GetUID() != ipPool.Status.AgentPodRef.UID || agentPod.Spec.Containers[0].Image != ipPool.Status.AgentPodRef.Image {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe compare the container image with the agent image name stored in the controller?

Suggested change
if agentPod.GetUID() != ipPool.Status.AgentPodRef.UID || agentPod.Spec.Containers[0].Image != ipPool.Status.AgentPodRef.Image {
if agentPod.GetUID() != ipPool.Status.AgentPodRef.UID || agentPod.Spec.Containers[0].Image != h.agentImage.String() {
How I tested

I check out the PR with the test1 branch, build and publish to my repo, and edit the values.file:

image:
  repository: bk201z/harvester-vm-dhcp-controller
  pullPolicy: Always
  # Overrides the image tag whose default is the chart appVersion.
  tag: "test1-head"

agent:
  image:
    repository: bk201z/harvester-vm-dhcp-agent
    pullPolicy: Always
    tag: "test1-head"

webhook:
  replicaCount: 1
  image:
    repository: bk201z/harvester-vm-dhcp-webhook
    pullPolicy: Always
    tag: "test1-head"

Install chart and create an IPPool

helm upgrade --install harvester-vm-dhcp-controller ./chart --namespace=harvester-system --create-namespace

Check out branch test2 and re-publish, and edit the values file again:

image:
  repository: bk201z/harvester-vm-dhcp-controller
  pullPolicy: Always
  # Overrides the image tag whose default is the chart appVersion.
  tag: "test2-head"

agent:
  image:
    repository: bk201z/harvester-vm-dhcp-agent
    pullPolicy: Always
    tag: "test2-head"

webhook:
  replicaCount: 1
  image:
    repository: bk201z/harvester-vm-dhcp-webhook
    pullPolicy: Always
    tag: "test2-head"

And then upgrade the chart:

helm upgrade --install harvester-vm-dhcp-controller ./chart --namespace=harvester-system --create-namespace

The agent pod seems not to be restarted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the agent image information is an internal controller variable (not in an object's spec), we need to reflect it on the object status first (done by the ippool-register reconcile loop). Then, rely on the other reconciliation loop to sync the status with the real pod.

It's the ippool-agent-monitor reconcile loop's mission to ensure that the running agent pod does not drift away. If it finds the agent pod it tracks does not match what's recorded in the AgentPodRef, it deletes the pod.

Thank you for pointing out there's a bug that the AgentPodRef is not updated if the agent pod already exists. I've updated the relevant part. Please take a look :)

Yu-Jack
Yu-Jack previously approved these changes Feb 6, 2024
Copy link
Contributor

@Yu-Jack Yu-Jack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

bk201
bk201 previously approved these changes Feb 6, 2024
Copy link
Member

@bk201 bk201 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Signed-off-by: Zespre Chang <zespre.chang@suse.com>
@starbops starbops merged commit 6fa17f3 into harvester:main Feb 7, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants