Skip to content

Router VIP Exposure Gated on LB Distribution Readiness #87

@zolug

Description

@zolug

Problem

The router container announces known VIPs via BGP as soon as BGP sessions are established, and VIPs learned later are also only tied to BGP session status — without checking whether the collocated LB container has active endpoints to distribute traffic to.
This might cause traffic loss during LB Pod startup or scale-out: external traffic is attracted before the LB can forward it.

See: constraints-and-limitations.md

Goal

The router controller must gate VIP advertisement on the collocated LB container's distribution readiness. VIPs should only be exposed via BIRD when the LB has at least one DistributionGroup with ready endpoints.

Mechanism

The LB controller already writes per-DistributionGroup readiness files to a shared emptyDir volume:

  • Creates: /var/run/meridio/lb-ready- when a DG has ready endpoints
  • Removes: when a DG is deleted or has no ready endpoints
  • Cleans up all files on startup

The router container currently does not mount this volume.

Implementation Steps

  1. Mount the shared volume in the router container

    • Add lb-run volumeMount at /var/run/meridio (readOnly) to the router container in config/templates/lb-deployment.yaml
  2. Watch readiness files as an event source

    • Use controller-runtime's WatchesRawSource builder option to integrate file system events (e.g., via fsnotify) as a non-CR event source into the existing reconcile loop
    • File creation/deletion triggers a reconcile of the Gateway
  3. Gate VIP exposure on readiness

    • During reconcile, check for presence of lb-ready-* files in /var/run/meridio/
    • If no readiness files exist: configure BIRD with empty VIP list (no routes advertised)
    • If at least one readiness file exists: configure BIRD with VIPs from Gateway.status.addresses as today
  4. Incremental approach

    • Start with a simple file presence check (no fsnotify): poll or check during reconcile
    • Evolve to WatchesRawSource + fsnotify for event-driven reactivity
    • For initial development, a dummy file can replace actual LB input to unblock router-side work

Considerations

  • The readiness directory path should be configurable (flag/env), matching the LB controller's readinessDir
  • The router should handle the case where the shared volume is not mounted (backward compatibility / standalone testing) — default to "LB always ready" behavior
  • Per-DG granularity is available (individual lb-ready- files) but for MVP, any readiness file present = ready is sufficient
  • Future refinement: per-VIP gating. Since the router operates at L3 and doesn't need to understand DG-to-VIP mappings, the LB could expose per-VIP readiness files directly (e.g., lb-vip-ready-) instead of per-DG. This keeps the router's logic simple — it only needs to check which VIPs are ready, not resolve DG → L34Route → VIP chains

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions