-
Notifications
You must be signed in to change notification settings - Fork 22
High Availability #470
Comments
Spire-controller-manager can't be separated unfortunately as it needs access to the We might do a trick to run it just on one of the replicas until |
Create a leader election so only one operates.
|
Could you file an issue for that on the spire-controller-manager repo and reference it here? |
Statefulset can support HA. So, that part isnt strictly needed. External db should work with one of the ha database charts/operators. While not ideal, I believe the controller-manager already does locking in its current config. So, I think its possible today to build a usable HA configuration with the chart as is. |
If you change replica > 1 today, spire server stops working. We need better HA support. In general, spire-server needs to move towards stateless operation without bagge of a controller or sqlite. Moving to a deployment w/replicaset encourages that. |
Could you please provide some more detail? how does things stop working? |
A integration test would be a better way to prove it does work. I don't know the details, but replica>1 caused outage on our cluster. |
+1 to more tests... But I'm not seeing brokenness. More details please:
|
@drewwells Could you have a look at either the postgres or the mysql example? Using that example you should be able to increase the number of replicas. Please let us know if that works. Hopefully that helps figuring out the difference with your current attempt. Based on that we might find a way to prevent the misconfiguration in this chart. |
Regardless of database, my concern is running multiple spire-controller-managers. The first comment noted that a spire-controller-manager ticket was needed here. Unless there are plans to build consensus across multiple stateful pods like etcd can do, we should drop the statefulset and use a deployment. |
The spire-controller-manager itself takes a Kubernetes lock to prevent multiple from acting at once. So, I believe that part is working today. The statefulset is currently required to persist the intermediate CA's for the individual server. If the statefulset members are spread across multiple nodes using antiaffinity, it should still be HA. |
Support high availability installation of spire-server to support node outages and upgrade with zero downtime. HA would support only external databases and multiple pods with anti affinity rules to spread pods across nodes. Database in a container would be a great step forward in portable example of this deployment.
Some changes that need to happen:
The text was updated successfully, but these errors were encountered: