Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Decommission Hub] Carbon Plan #3483

Closed
16 of 18 tasks
colliand opened this issue Dec 1, 2023 · 18 comments · Fixed by #3544
Closed
16 of 18 tasks

[Decommission Hub] Carbon Plan #3483

colliand opened this issue Dec 1, 2023 · 18 comments · Fixed by #3544
Assignees

Comments

@colliand
Copy link
Contributor

colliand commented Dec 1, 2023

Summary

CS&S forwarded a request sent on 2023-11-30 requesting that the Carbon Plan hub be decommissioned. The message includes text:

We will migrate all our materials off the 2i2c Hub by December 15th. If possible, we would appreciate it if the Hub is shut down on December 18th or 19th, as CarbonPlan will be on winter break December 20th-Jan 2nd, and it would be ideal if we’re available for any questions during the process.

Info

Task List

Phase I

  • Confirm with Community Representative that the hub is no longer in use and it's safe to decommission
  • Confirm if there is any data to migrate from the hub before decommissioning
    • If yes, confirm where the data should be migrated to
      • Confirm a 2i2c Engineer has access to the destination in order to complete the data migration
    • If no, confirm it is ok to delete all the data stored in the user home directories

Phase II - Hub Removal

(These steps are described in more detail in the docs at https://infrastructure.2i2c.org/en/latest/hub-deployment-guide/hubs/other-hub-ops/delete-hub.html)

  • Manage existing data (migrate data from the hub or delete it)
  • Delete the hub's authentication application on GitHub or CILogon (note CILogon removal requires the hub config in place)
  • Remove the appropriate config/clusters/<cluster_name>/<hub_name>.values.yaml files. A complete list of relevant files can be found under the appropriate entry in the associated cluster.yaml file.
  • Remove the associated hub entry from the config/clusters/<cluster_name>/cluster.yaml file.
  • Remove the hub deployment
    • helm --namespace HUB_NAME delete HUB_NAME
    • kubectl delete namespace HUB_NAME

Phase III - Cluster Removal

This phase is only necessary for single hub clusters.

  • Remove the cluster's datasource from the central Grafana with:
    • deployer grafana central-ds remove CLUSTER_NAME
  • Run terraform plan -destroy and terraform apply from the appropriate workspace, to destroy the cluster
  • Delete the terraform workspace: terraform workspace delete <NAME>
  • Remove the associated config/clusters/<cluster_name> directory and all its contents
  • Remove the cluster from CI:
  • Remove the cluster from the list of grafana datasources at https://grafana.pilot.2i2c.cloud/datasources
  • Remove A record from Namecheap account
@damianavila
Copy link
Contributor

Assigned @yuvipanda to process this decommission given he will be around during the requested shutdown time.

@damianavila damianavila removed their assignment Dec 8, 2023
@yuvipanda
Copy link
Member

I'd also like us to have an 'exit interview' with them, and I'd like to be present for that. Thoughts, @colliand and @jmunroe?

@jmunroe
Copy link
Contributor

jmunroe commented Dec 8, 2023

Agreed. See https://2i2c.freshdesk.com/a/tickets/1156 for the follow-up with CarbonPlan. Looks like there might be a 30min already scheduled with @colliand for after AGU.

@maxrjones
Copy link
Contributor

@yuvipanda I'd be glad to chat separately if the already scheduled block at 12-12:30pm ET tomorrow doesn't work for you.

Thank you all for your work on the decommissioning. On the "Confirm if there is any data to migrate from the hub before decommissioning ", does this refer to backing up the NFS? I've worked with users to backup their individual home directories, but I'm curious about whether you have recommendations regarding creating a backup of the full file storage system.

@yuvipanda
Copy link
Member

@maxrjones I can tar up everyone's homedirs and put it in your homedir. how does that sound?

I'll try to join the meeting if I can!

@maxrjones
Copy link
Contributor

@maxrjones I can tar up everyone's homedirs and put it in your homedir. how does that sound?

This seems like a good idea to me, thank you!

I know this is unlikely, but we had a Hub on GCP that was decommissioned sometime during late summer to early fall 2022. One person who was on leave during that time lost their home dir during that process. Do you happen to have a record of if any backing up happened, or if the storage part of that infrastructure remains?

@yuvipanda
Copy link
Member

It was wonderful chatting with you today, @maxrjones! There's now a homes.tar.gz in your own home directory that is a compressed version of everyone's home directories. Can you download that and let me know when you are done, so I can then proceed to decomissioning? I'll decomission this wed-fri so nobody is interrupted.

Unfortunately I don't think anything remains from the GCP time, so we can not retrieve that.

@maxrjones
Copy link
Contributor

It was wonderful chatting with you today, @maxrjones! There's now a homes.tar.gz in your own home directory that is a compressed version of everyone's home directories. Can you download that and let me know when you are done, so I can then proceed to decomissioning? I'll decomission this wed-fri so nobody is interrupted.

Great chatting with you as well! Great, thank you! I successfully downloaded it.

@yuvipanda
Copy link
Member

Great, @maxrjones! I'll decomission this sometime in the next few days and update this issue.

yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Dec 21, 2023
@github-project-automation github-project-automation bot moved this from Needs Shaping / Refinement to Complete in DEPRECATED Engineering and Product Backlog Dec 21, 2023
@github-project-automation github-project-automation bot moved this from Todo 👍 to Done 🎉 in Sprint Board Dec 21, 2023
@yuvipanda yuvipanda reopened this Dec 21, 2023
@yuvipanda
Copy link
Member

@maxrjones I've cleaned up all the resources that we created. I see a couple of nebari related resources, so have not touched them. I'd appreciate if you (or whoever is taking care of the nebari stuff) could take a look to make sure that all the resources (EBS volumes in particular) that are still present are expected to be there, and not leftovers from our cluster!

@maxrjones
Copy link
Contributor

@maxrjones I've cleaned up all the resources that we created. I see a couple of nebari related resources, so have not touched them. I'd appreciate if you (or whoever is taking care of the nebari stuff) could take a look to make sure that all the resources (EBS volumes in particular) that are still present are expected to be there, and not leftovers from our cluster!

Thanks, Yuvi! I'll take a look tomorrow.

@choldgraf
Copy link
Member

FYI I think removing CarbonPlan is messing up our global usage dashboard, I've opened an issue here:

@maxrjones
Copy link
Contributor

@maxrjones I've cleaned up all the resources that we created. I see a couple of nebari related resources, so have not touched them. I'd appreciate if you (or whoever is taking care of the nebari stuff) could take a look to make sure that all the resources (EBS volumes in particular) that are still present are expected to be there, and not leftovers from our cluster!

I reached out to the Nebari folks about the EBS volumes and confirmed those resources are still present. There's also a handful of older volumes from 2021 named kubernetes-dynamic-pvc-.... These aren't associated with the nebari infrastructure. It looks like the activity on the 400 GB volume ended at the same time as the resource cleanup, so I'm guessing this is the shared filesystem along with other components from 2i2c. Does that sounds right to you? Do you recommend taking a look at any of the other data or backing up anything apart from what we already archived from the Hub home directories?

@yuvipanda
Copy link
Member

@maxrjones ah, I'll take a look at those early next week and see what they may be and get back to you!

@yuvipanda
Copy link
Member

@maxrjones I took a quick look and these can also all be removed. kubernetes-dynamic-pvc-297b08cc-43b7-4bc3-8f31-3528898571c1 has prometheus usage data from the cluster, which you can safe if you want. Otherwise these are all safe to delete.

@damianavila damianavila moved this from Done 🎉 to Waiting 🕛 in Sprint Board Jan 18, 2024
@yuvipanda
Copy link
Member

@maxrjones just wanted to check in and see if there's anything else we can do to help :)

@yuvipanda
Copy link
Member

@maxrjones I'm going to close this one out! Let us know if there's anything more we need to do :) It was great working with you over the last few years! <3

@github-project-automation github-project-automation bot moved this from Waiting 🕛 to Done 🎉 in Sprint Board Feb 2, 2024
@maxrjones
Copy link
Contributor

@maxrjones I'm going to close this one out! Let us know if there's anything more we need to do :) It was great working with you over the last few years! <3

Thanks @yuvipanda! It was great working with you as well. Super grateful for all that you and your team members bring to the open source community!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done 🎉
Development

Successfully merging a pull request may close this issue.

6 participants