Skip to content

Make sure prometheus PVs of all our clusters have deletionPolicy set to retain #2717

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
yuvipanda opened this issue Jun 26, 2023 · 0 comments
Labels
allocation:internal-eng nominated-to-be-resolved-during-q4-2023 Nomination to be resolved during q4 goal of reducing the technical debt tech:prometheus

Comments

@yuvipanda
Copy link
Member

We don't really want to lose prometheus data, so we should mark all the PVs created for prometheus to have deletionPolicy set to Retain. Follow-up to #2688

yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Jun 26, 2023
Helps us recover prometheus data in case of accidental
deletion

Fixes 2i2c-org#2717
yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Jun 26, 2023
Helps us recover prometheus data in case of accidental
deletion

Fixes 2i2c-org#2717
yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Jun 28, 2023
Helps us recover prometheus data in case of accidental
deletion

Fixes 2i2c-org#2717
@damianavila damianavila moved this from Needs Shaping / Refinement to In progress in DEPRECATED Engineering and Product Backlog Jul 4, 2023
@damianavila damianavila moved this to Review / QA 👀 in Sprint Board Jul 4, 2023
@damianavila damianavila moved this from Review / QA 👀 to In Progress ⚡ in Sprint Board Jul 5, 2023
yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Aug 7, 2023
We add this unconditionally to all clusters for simplification,
so we can set storageClass: gp3 for new clusters that come up on
AWS without issue. This doesn't change the default, and does not
change the storageclass in existing clusters. In addition to using
gp3, it also sets reclaimPolicy to Retain, so if the PVC is deleted,
it does not delete the PV or the underlying EBS volume.

Ref 2i2c-org#2906
Ref 2i2c-org#2717
yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Sep 9, 2023
- Set up our own StorageClass for GKE clusters specifically for
  use with prometheus data.
- Sets retentionPolicy to 'Retain', so we don't accidentally
  kill the disk and lose all the data.
- Sets the disk type to 'Balanced', which is backed by SSDs and
  *much* faster than spinning disks. No more grafana timeouts!
- Move the existing data by manually attaching to a small VM I
  created, and then copying over to new PVC.
- Reduction in size, as
2i2c-org#3093
  drastically reduced the size of the data! We went from about
  512GB to only about 150GB after that. The size explosion has
  been solved! 512GB here still gives us enough room to grow.

Once this lands, I'll manually go through and do this for every
single GCP cluster. Grafana timeouts BE GONE.

Ref 2i2c-org#2934
Ref 2i2c-org#2717
Ref 2i2c-org#2847
Fixes 2i2c-org#3111
GeorgianaElena pushed a commit to GeorgianaElena/pilot-hubs that referenced this issue Sep 12, 2023
- Set up our own StorageClass for GKE clusters specifically for
  use with prometheus data.
- Sets retentionPolicy to 'Retain', so we don't accidentally
  kill the disk and lose all the data.
- Sets the disk type to 'Balanced', which is backed by SSDs and
  *much* faster than spinning disks. No more grafana timeouts!
- Move the existing data by manually attaching to a small VM I
  created, and then copying over to new PVC.
- Reduction in size, as
2i2c-org#3093
  drastically reduced the size of the data! We went from about
  512GB to only about 150GB after that. The size explosion has
  been solved! 512GB here still gives us enough room to grow.

Once this lands, I'll manually go through and do this for every
single GCP cluster. Grafana timeouts BE GONE.

Ref 2i2c-org#2934
Ref 2i2c-org#2717
Ref 2i2c-org#2847
Fixes 2i2c-org#3111
@yuvipanda yuvipanda added the nominated-to-be-resolved-during-q4-2023 Nomination to be resolved during q4 goal of reducing the technical debt label Oct 16, 2023
@yuvipanda yuvipanda removed their assignment Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
allocation:internal-eng nominated-to-be-resolved-during-q4-2023 Nomination to be resolved during q4 goal of reducing the technical debt tech:prometheus
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

2 participants