Skip to content
This repository was archived by the owner on Jun 10, 2024. It is now read-only.

Create sample HA deployment #127

Closed
cmgrote opened this issue May 10, 2021 · 2 comments
Closed

Create sample HA deployment #127

cmgrote opened this issue May 10, 2021 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@cmgrote
Copy link
Member

cmgrote commented May 10, 2021

Create a sample chart for demonstrating a high-availability deployment of the Crux repository:

  • 2-3 OMAG Server Platform pods
  • Each configured with the same configuration document for the Crux repo config
  • Each Crux repo config using a local (e.g. Rocks) index, but pointing to the same "remote" (OMAG-external) document store and transaction log
  • Probably simplest to start with just Kafka as this external store (for both document store and transaction log)

Configure the polling latency for Kafka to be 10-50ms rather than 1 full second, so that the default sync-index behaviour is not degraded too much by the polling intervals.

Also document the structure of such a configuration for reference purposes (explaining that Kafka is used only as an example, but could be other external mechanisms like S3, JDBC, etc).

@cmgrote cmgrote added the enhancement New feature or request label May 10, 2021
@cmgrote cmgrote self-assigned this May 10, 2021
cmgrote referenced this issue in cmgrote/egeria-connector-xtdb May 14, 2021
Signed-off-by: Christopher Grote <chris@thegrotes.net>
cmgrote added a commit that referenced this issue May 14, 2021
#127 Initial chart for providing a sample high availability config
@cmgrote
Copy link
Member Author

cmgrote commented May 15, 2021

Also consider documenting a more dynamic HA deployment:

  • New connector OMAG pods that can come online (or be dropped) at any time
  • Some quorum mechanism across the pods so that one of the pods can be elected to periodically create an index checkpoint and store in some out-of-cluster location (e.g. S3)
  • Initial index store of each new OMAG pod taken from the latest such external checkpoint (see: https://opencrux.com/reference/21.04-1.16.0/checkpointing.html)
  • A readiness probe that would ideally only succeed once the pod's local index is up-to-date (not sure this would be feasible, as what would indicate it is up-to-date assuming there is always some activity happening via other pods (?))

This will be reliant on having a configuration mechanism for the OMAG platform itself that does not require configuration and / or startup via REST, as otherwise the readiness probe would have to be successful just to configure and startup the platform -- in which case it would already start receiving other traffic via a load-balancing service, all of which would fail prior to the connector being configured and started up (takes at least 20-30 seconds for an empty system, could be several minutes or longer if also bootstrapping its index). Having several minutes of "random" failures for requests that the load-balancer just happens to send to this bootstrapping pod would be unacceptable -- hence dependency on having a non-REST mechanism to start the pods, so readiness probe can indicate that the pod is truly ready to start receiving and (correctly) responding to requests.

@cmgrote
Copy link
Member Author

cmgrote commented May 21, 2021

Moved dynamic deployment to a new issue #150, given its dependency on Egeria core changes. Initial documentation of the original issue is now complete: https://odpi.github.io/egeria-connector-crux/high-availability/

@cmgrote cmgrote closed this as completed May 21, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant