-
Notifications
You must be signed in to change notification settings - Fork 277
Use go-control-plane's NewServer() and SnapshotCache system to improve ADS #2683
Comments
incoming WIP pr this afternoon! |
btw a working branch for this has been avaialble since immemorial times (eduser25@f4de97b), devel and exploration has been put on hold as there are some shortcomings on the ADS cache implementation of go-control-plane that have to be properly addressed first. |
@eduser25 could update this issue with details on "here are some shortcomings on the ADS cache implementation of go-control-plane" ? |
context: envoyproxy/go-control-plane#399 |
Update: |
@steeling Can we use this issue as the root of subtasks of snapshot cache? |
@shashankram moving the conversation about the snapshot cache here. Looking at the github description https://github.com/envoyproxy/go-control-plane, I'm not entirely sure the snapshot cache is smart enough to do internal diffs to determine when a change is needed. Instead, we set a new version on each update (even if configs haven't changed), and I believe the snapshot cache will push out the requested changes. So in that sense the snapshot cache will not reduce the load on the osm-controller. Happy to be wrong about this, but that's my understanding upon some brief reading |
Yes, I believe that's fine because the Envoy instance will compute the diff anyway. In that regard, the value of using the snapshot cache is to avoid using our controller implementation of the XDS state machine and let the cache deal with it. We've seen quite a few bugs in the past with the XDS state machine implementation in the controller, and using the snapshot cache will simplify what we need to worry about after generating the config. Moreover, I see value in simply writing the config to the cache and forgetting about what happens after that, which isn't the case currently. |
agreed! curious on your opinion on once we migrate to snapshot cache, whether we should keep the messaging.Broker, to know to trigger config generation on changes, or to purely rely on a continuous loop to constantly trigger config generation |
I don't think we should have a continuous loop to trigger config generation, but rather a combinarion of event driven approach with coalescing and the addition of a periodic reconciler such as Ticker if necessary to recover from transient inconsistencies in the system. Constantly generating config in a tight loop when there's no change in the system has the downside of both controller and Envoy operating in a tight loop all the time at 100% CPU. This is especially evident in scale environments with 1000s of pods. The broker is really meant to provide a streamlined mechanism for event pub-sub, with additional metrics and checks in place regarding which events are broadcast, multicast, or unicast . The broker is still necessary in that regard given the snapshot cache isn't smart enough to drop events it doesn't need. |
Yes it does! |
Fixed in #5056 |
Leverage go-control-plane's SnapshotCache.
The snapshot cache allows us to set a single-versioned snapshot of all xDS responses (for each proxy) in a single location, while the snapshot cache implementation maintains all state and processes for responding to the actual gRPC requests. It allows us to not worry about the implementation details of delta, vs aggregate updates, streams vs fetch, etc.
From @shashankram: we have had issues in the past with the state logic for responding to proxy Stream Resource requests, and it is very difficult to debug. Offloading this to an upstream implementation should lighten our load substantially
The text was updated successfully, but these errors were encountered: