Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(service): listen to endpoint changes #5085

Merged
merged 12 commits into from
Feb 18, 2025

Conversation

dmarkhas
Copy link
Contributor

Currently, using the Service source, ExternalDNS reconciles the endpoints only when the service definition itself changes.
However, in many cases the desired behavior is to trigger the reconcile when the endpoints themselves change, even if there was no change to the Service object itself.
Please see #4907 for a more elaborate description of the issue this PR aims to solve.

While the same behavior can be effectively achieved today by setting interval=1s, that would cause too many unnecessary reconciliations; the proposed solution ensures a reconcile happens only when endpoints change, and as soon as they change.

Fixes #4907

Checklist

  • Unit tests updated
  • End user documentation updated

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Feb 10, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @dmarkhas. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 10, 2025
# Conflicts:
#	docs/sources/service.md
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 10, 2025
@dmarkhas dmarkhas changed the title React to endpoint changes feat: React to endpoint changes Feb 10, 2025
@ivankatliarchuk
Copy link
Contributor

Thanks for working on a fix.

Worth to consider 3 options

  1. Flag as it currently implemented; default value true|false
  2. Annotation supported on service
  3. No new flags and no annotation, as it should be supported with flags
--[no-]events
--source=service

@dmarkhas
Copy link
Contributor Author

Thanks for the feedback, the guiding principle for me was that this is a breaking change so it must be enabled explicitly.
I would not want anyone who currently uses --events --source=service to suddenly start seeing massive reconcile cycles that he isn't interested in, and has no way to disable.
Thus, the addition of a new flag with a default value of false makes sense to me as a user, and allows for easy migration (so that in future versions we can make it default to true and ultimately remove it, if we decide to).

Adding a new annotation to Service objects would make the implementation less straight-forward, as we would need to track the ownership of Endpoints back to their Service, and there's no fool-proof way of doing that I believe (there's no ownerReferences for Endpoints).

All in all I think the new flag achieves the desired result with minimum changes and aligns well with the current design and implementation of the controller.

@ivankatliarchuk
Copy link
Contributor

Got it. I'll add this functionality to be supported by an annotation to my list of TODO, agree quite a significant change.

I leave a decision where we need a specific flag for that for @mloiseleur, sounds like a safer thing to do, is just this is exactly the reason for --[no-]events to exist.

@ivankatliarchuk
Copy link
Contributor

Could you please also provide a way to test this manually with manifests and kubectl commands? Was this change tested on a cluster?

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 11, 2025
@dmarkhas
Copy link
Contributor Author

It has, we're running this code on one of our clusters currently.

Would manual steps with Route 53 CLI commands showing the records are updated as soon as endpoints are added/removed be OK?

@ivankatliarchuk
Copy link
Contributor

Would manual steps with Route 53 CLI commands showing the records are updated as soon as endpoints are added/removed be OK?

Yes, that should work for sure

@dmarkhas
Copy link
Contributor Author

Great, let me know if this works (I redacted some identifiers like our hosted zone name and ID).

  • Create a Namespace, Deployment and a Service, annotated with external-dns.alpha.kubernetes.io/hostname:
apiVersion: v1
kind: Namespace
metadata:
  name: external-dns-demo
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mydemo-deployment
  namespace: external-dns-demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mydemo
  template:
    metadata:
      labels:
        app: mydemo
    spec:
      containers:
      - name: busybox
        image: busybox
        command: ["sh", "-c", "while true; do sleep 3600; done"]
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: mydemo-service
  namespace: external-dns-demo
  annotations:
    external-dns.alpha.kubernetes.io/hostname: "external-dns-test.euw1.stgv2.domain”
spec:
  type: ClusterIP
  clusterIP: None
  clusterIPs:
    - None
  selector:
    app: mydemo
  • Apply the objects to the cluster
    kubectl apply -f external-dns-demo.yaml

  • Ensure the A record external-dns-test.euw1.stgv2.domain was created and contains 1 entry

aws route53 list-resource-record-sets \
  --hosted-zone-id ABCDEFG12345 \
  --query "ResourceRecordSets[?Name=='external-dns-test.euw1.stgv2.domain.' && Type=='A']" \
  --output text
external-dns-test.euw1.stgv2.domain.     300     A
RESOURCERECORDS 10.244.136.249
  • Scale the deployment to 5 replicas
kubectl scale deployment mydemo-deployment --replicas=5 -n external-dns-demo
  • Ensure the A record external-dns-test.euw1.stgv2.domain now contains 5 entries
aws route53 list-resource-record-sets \
  --hosted-zone-id ABCDEFG12345 \
  --query "ResourceRecordSets[?Name=='external-dns-test.euw1.stgv2.domain.' && Type=='A']" \
  --output text
external-dns-test.euw1.stgv2.domain.     300     A
RESOURCERECORDS 10.242.222.129
RESOURCERECORDS 10.242.58.6
RESOURCERECORDS 10.242.85.128
RESOURCERECORDS 10.244.136.249
RESOURCERECORDS 10.244.20.70
  • Scale the deployment down to 2 replicas
kubectl scale deployment mydemo-deployment --replicas=5 -n external-dns-demo
  • Ensure the A record external-dns-test.euw1.stgv2.domain now contains 2 entries
aws route53 list-resource-record-sets \
  --hosted-zone-id ABCDEFG12345 \
  --query "ResourceRecordSets[?Name=='external-dns-test.euw1.stgv2.domain.' && Type=='A']" \
  --output text
external-dns-test.euw1.stgv2.domain.     300     A
RESOURCERECORDS 10.242.222.129
RESOURCERECORDS 10.242.58.6

@ivankatliarchuk
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 13, 2025
@ivankatliarchuk
Copy link
Contributor

ivankatliarchuk commented Feb 13, 2025

For some reason I keep getting this

FATA[0000] flag parsing error: unknown long flag '--listen-endpoint-events'

^ resolved. was incorrect branch

I cloned repo, and this is my arguments.

go run main.go \
    --provider=aws \
    --registry=txt \
    --listen-endpoint-events \
    --source=service \
    --aws-zone-type=private \
    --log-level=info

@ivankatliarchuk
Copy link
Contributor

ivankatliarchuk commented Feb 13, 2025

I tested/validate the actual behavior. The zones and everything created just for this test, so if anything leaked like IDs no security risks

OK if record is pointing to unknown domain
Screenshot 2025-02-13 at 10 14 44

The example shared is headles service with clusterIp type

Single replica deployment
ex-dns-single-replica-deployment

Replicas scaled to 5

ext-dns-records-added-when-replicas-scaled

Replica scaled to 0

scaled-back-in-no-replicas

ext DNS logs

ext-dns-logs-v1
more-ext-dns-logs

@ivankatliarchuk
Copy link
Contributor

@dmarkhas you need to fix tests and make linter happy. Dev guide https://github.com/kubernetes-sigs/external-dns/blob/master/docs/contributing/dev-guide.md

I'll review the code when green

@dmarkhas
Copy link
Contributor Author

dmarkhas commented Feb 14, 2025 via email

@dmarkhas
Copy link
Contributor Author

The tests are good now. I also updated the linter configuration which was invalid.

@ivankatliarchuk
Copy link
Contributor

On edge with new flag vs --events + --source=service. Otherwise lgtm

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 15, 2025
docs/sources/service.md Outdated Show resolved Hide resolved
@dmarkhas
Copy link
Contributor Author

As a user, I don't think --events + --source=service is a good approach because it would be a significant change from the current behavior.

A more elegant approach could be to add --source=endpoints, which would be very explicit as to the expected behavior of the controller, but it's not possible for users to add external-dns annotations on endpoints objects (the endpoints controller unfortunately does not propagate annotations from the Service, only labels), and it's not possible to accurately establish the ownership relationship of endpoints to their respective service.

However, if we switched to EndpointSlices the ownership issue would be solved, and then we can extract the relevant annotations from the owner Service, but I think that would be a far-reaching change in the controller.

docs/sources/service.md Outdated Show resolved Hide resolved
Co-authored-by: Michel Loiseleur <97035654+mloiseleur@users.noreply.github.com>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 16, 2025
dmarkhas and others added 2 commits February 18, 2025 18:42
Co-authored-by: Michel Loiseleur <97035654+mloiseleur@users.noreply.github.com>
Copy link
Contributor

@ivankatliarchuk ivankatliarchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

/assign @mloiseleur

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 18, 2025
@mloiseleur
Copy link
Contributor

/retitle feat(service): listen to endpoint changes

@k8s-ci-robot k8s-ci-robot changed the title feat: React to endpoint changes feat(service): listen to endpoint changes Feb 18, 2025
@mloiseleur
Copy link
Contributor

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ivankatliarchuk, mloiseleur

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 18, 2025
@k8s-ci-robot k8s-ci-robot merged commit d01586b into kubernetes-sigs:master Feb 18, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DNS updates to trigger on changes to selector targets, not only to resource with external-dns annotations
4 participants