Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use polling during startup to wait for informers to sync #6614

Merged
merged 1 commit into from
Oct 23, 2024

Conversation

tsaarni
Copy link
Member

@tsaarni tsaarni commented Aug 16, 2024

This change will use polling for HasSynced() call to align with how client-go's WaitForCacheSync() operates. Previously it was only called when we received objects from the informer. By changing to polling, we prevent a race condition where Contour fails to recognize that the final object in the initial list has been received, which would prevent the xDS server from ever starting.

Fixes #6613

Details

The startup of a follower instance depends on informers signaling that they have finished synchronizing resources with the API server. We use the SingleFileTracker from client-go to track the processing of initial objects. This tracker increments when processing starts on a resource and decrements when its finished.

The HasSynced() method should ideally report synchronization status true right after Finished() is called for the last resource in the initial list (tracker reaches zero). However, in some cases it fails to report true immediately, causig the boolean trigger that starts the xDS server to never be set.

In case of a leader instance, an extra update triggered by the leader selection itself has ensured that the xDS server starts.

@tsaarni tsaarni requested a review from a team as a code owner August 16, 2024 16:03
@tsaarni tsaarni requested review from skriss and sunjayBhatia and removed request for a team August 16, 2024 16:03
@sunjayBhatia sunjayBhatia requested review from a team, rajatvig and izturn and removed request for a team August 16, 2024 16:03
@tsaarni tsaarni added the release-note/small A small change that needs one line of explanation in the release notes. label Aug 16, 2024
@tsaarni tsaarni force-pushed the poll-hassynced branch 2 times, most recently from 3046ee2 to 7693441 Compare August 16, 2024 16:13
@tsaarni tsaarni added the do not merge Don't merge this PR until this label is removed. label Aug 16, 2024
Copy link

codecov bot commented Aug 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.04%. Comparing base (95a8ab2) to head (63fcd50).
Report is 2 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #6614      +/-   ##
==========================================
+ Coverage   81.01%   81.04%   +0.02%     
==========================================
  Files         133      133              
  Lines       19997    20001       +4     
==========================================
+ Hits        16201    16210       +9     
+ Misses       3503     3498       -5     
  Partials      293      293              
Files with missing lines Coverage Δ
internal/contour/handler.go 83.43% <100.00%> (+3.43%) ⬆️

@tsaarni tsaarni removed the do not merge Don't merge this PR until this label is removed. label Aug 19, 2024
@tsaarni
Copy link
Member Author

tsaarni commented Aug 19, 2024

Fix after test case failure. Coverage target is bit difficult to reach for this one.

Copy link

github-actions bot commented Sep 4, 2024

The Contour project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 14d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the PR is closed

You can:

  • Ensure your PR is passing all CI checks. PRs that are fully green are more likely to be reviewed. If you are having trouble with CI checks, reach out to the #contour channel in the Kubernetes Slack workspace.
  • Mark this PR as fresh by commenting or pushing a commit
  • Close this PR
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 4, 2024
@tsaarni tsaarni removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 15, 2024
Copy link

The Contour project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 14d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the PR is closed

You can:

  • Ensure your PR is passing all CI checks. PRs that are fully green are more likely to be reviewed. If you are having trouble with CI checks, reach out to the #contour channel in the Kubernetes Slack workspace.
  • Mark this PR as fresh by commenting or pushing a commit
  • Close this PR
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 30, 2024
@tsaarni tsaarni removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 30, 2024
Copy link

The Contour project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 14d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the PR is closed

You can:

  • Ensure your PR is passing all CI checks. PRs that are fully green are more likely to be reviewed. If you are having trouble with CI checks, reach out to the #contour channel in the Kubernetes Slack workspace.
  • Mark this PR as fresh by commenting or pushing a commit
  • Close this PR
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 15, 2024
@tsaarni tsaarni removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 15, 2024
The "upstream" informer trackers do not always return HasSynced() == true
immediately, even if we have already processed all events the informer
dispatched us.  Change to polling so that we can progress also in that case.

Signed-off-by: Tero Saarni <tero.saarni@est.tech>
@tsaarni tsaarni merged commit 96dc403 into projectcontour:main Oct 23, 2024
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/small A small change that needs one line of explanation in the release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Contour occasionally gets stuck in a non-ready state and fails to start the XDS server
2 participants