Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terraform: switch to pd-balanced, reduce size to save cost #2102

Merged
merged 4 commits into from
Jan 25, 2022

Conversation

minrk
Copy link
Member

@minrk minrk commented Jan 18, 2022

from 1T. Should save ~$1k/month

Ignore the version bumps, which are only to make terraform match what's already been auto-upgraded to, so the new pool has the same version as what's already deployed.

This creates a new user pool to replace the old one, but leaves the existing one with an autoscale max of 1 so it will slowly drain (will need some help with cordoning). Once it's drained, another PR can actually delete the old pool.

I'll do the apply after this is merged. I've synced the versions, but not created the new pool.

xref: cost calculations

minrk added 2 commits January 18, 2022 12:07
matches current auto-upgrade version, doesn't actually upgrade nodes
@@ -71,6 +79,50 @@ resource "google_container_node_pool" "user" {
node_locations = ["${local.location}-a"]
version = local.gke_version

autoscaling {
min_node_count = 0
max_node_count = 1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not trigger immediate scale-down. Autoscale doesn't force these bounds to be satisfied continuously. Instead, it should only prevent scale-up.

should reduce operational costs by ~1k/month

sets autoscale-max on existing user pool to 1 to avoid allocation of new nodes.

Can delete old user pool once it's drained.
@yuvipanda
Copy link
Contributor

Would this cost us performance in terms of docker pull throughput? That was why we made them big SSDs to begin with, as performance unfortunately scales linearly with size.

The newer pd-balanced type is also something to try to cut cost. That's what I now use on my berkeley clusters.

saves cost without losing as much space/performance
@minrk
Copy link
Member Author

minrk commented Jan 25, 2022

Would this cost us performance in terms of docker pull throughput?

I have no idea. I don't think we've done any measuring on how much Disk IO limits pulls vs network IO. gcloud docs suggest a 512GB SSD should have 384/768 MBps read/write performance. That seems very fast! But it's ~1/2 what we should have now.

I think we should probably switch to pd-balanced and see what happens. Switching to pd-balanced without losing size would save 40%. We could save the same 50% if we dropped to 800, so I did that. I didn't know about pd-balanced! I switched the core pool to that as well.

@minrk
Copy link
Member Author

minrk commented Jan 25, 2022

OK, I'll give this a go.

@minrk minrk changed the title terraform: Reduce PD-SSD disk size to 500GB terraform: switch to pd-balanced, reduce size to save cost Jan 25, 2022
@minrk minrk merged commit b830a88 into jupyterhub:master Jan 25, 2022
@minrk minrk deleted the reduce-disk-size branch January 25, 2022 11:38
This was referenced Jan 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants