-
Notifications
You must be signed in to change notification settings - Fork 74
Make sure prometheus PVs of all our clusters have deletionPolicy set to retain #2717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
allocation:internal-eng
nominated-to-be-resolved-during-q4-2023
Nomination to be resolved during q4 goal of reducing the technical debt
tech:prometheus
Comments
yuvipanda
added a commit
to yuvipanda/pilot-hubs
that referenced
this issue
Jun 26, 2023
Helps us recover prometheus data in case of accidental deletion Fixes 2i2c-org#2717
yuvipanda
added a commit
to yuvipanda/pilot-hubs
that referenced
this issue
Jun 26, 2023
Helps us recover prometheus data in case of accidental deletion Fixes 2i2c-org#2717
yuvipanda
added a commit
to yuvipanda/pilot-hubs
that referenced
this issue
Jun 28, 2023
Helps us recover prometheus data in case of accidental deletion Fixes 2i2c-org#2717
yuvipanda
added a commit
to yuvipanda/pilot-hubs
that referenced
this issue
Aug 7, 2023
We add this unconditionally to all clusters for simplification, so we can set storageClass: gp3 for new clusters that come up on AWS without issue. This doesn't change the default, and does not change the storageclass in existing clusters. In addition to using gp3, it also sets reclaimPolicy to Retain, so if the PVC is deleted, it does not delete the PV or the underlying EBS volume. Ref 2i2c-org#2906 Ref 2i2c-org#2717
Closed
yuvipanda
added a commit
to yuvipanda/pilot-hubs
that referenced
this issue
Sep 9, 2023
- Set up our own StorageClass for GKE clusters specifically for use with prometheus data. - Sets retentionPolicy to 'Retain', so we don't accidentally kill the disk and lose all the data. - Sets the disk type to 'Balanced', which is backed by SSDs and *much* faster than spinning disks. No more grafana timeouts! - Move the existing data by manually attaching to a small VM I created, and then copying over to new PVC. - Reduction in size, as 2i2c-org#3093 drastically reduced the size of the data! We went from about 512GB to only about 150GB after that. The size explosion has been solved! 512GB here still gives us enough room to grow. Once this lands, I'll manually go through and do this for every single GCP cluster. Grafana timeouts BE GONE. Ref 2i2c-org#2934 Ref 2i2c-org#2717 Ref 2i2c-org#2847 Fixes 2i2c-org#3111
GeorgianaElena
pushed a commit
to GeorgianaElena/pilot-hubs
that referenced
this issue
Sep 12, 2023
- Set up our own StorageClass for GKE clusters specifically for use with prometheus data. - Sets retentionPolicy to 'Retain', so we don't accidentally kill the disk and lose all the data. - Sets the disk type to 'Balanced', which is backed by SSDs and *much* faster than spinning disks. No more grafana timeouts! - Move the existing data by manually attaching to a small VM I created, and then copying over to new PVC. - Reduction in size, as 2i2c-org#3093 drastically reduced the size of the data! We went from about 512GB to only about 150GB after that. The size explosion has been solved! 512GB here still gives us enough room to grow. Once this lands, I'll manually go through and do this for every single GCP cluster. Grafana timeouts BE GONE. Ref 2i2c-org#2934 Ref 2i2c-org#2717 Ref 2i2c-org#2847 Fixes 2i2c-org#3111
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
allocation:internal-eng
nominated-to-be-resolved-during-q4-2023
Nomination to be resolved during q4 goal of reducing the technical debt
tech:prometheus
We don't really want to lose prometheus data, so we should mark all the PVs created for prometheus to have deletionPolicy set to Retain. Follow-up to #2688
The text was updated successfully, but these errors were encountered: