From d88200d6bccdb357485029b5522a9ab389626c21 Mon Sep 17 00:00:00 2001 From: David Cook Date: Thu, 31 Oct 2024 18:00:30 +1100 Subject: [PATCH 1/4] Add checklist for server migrations --- .github/ISSUE_TEMPLATE/server-migration.md | 97 ++++++++++++++++++++++ 1 file changed, 97 insertions(+) create mode 100644 .github/ISSUE_TEMPLATE/server-migration.md diff --git a/.github/ISSUE_TEMPLATE/server-migration.md b/.github/ISSUE_TEMPLATE/server-migration.md new file mode 100644 index 000000000..0e96c42b0 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/server-migration.md @@ -0,0 +1,97 @@ +--- +name: Server migration +about: Checklist to migrate to an upgraded server +title: '' +labels: '' +assignees: '' + +--- + +Checklist based on general guide https://github.com/openfoodfoundation/ofn-install/wiki/Migrating-a-Production-Server + +## 1. Setting up the new server +- [ ] Check old server config for any additional services to be aware of (eg `/etc/nginx/sites-available`). Document any necessary steps for migration. +- [ ] Hosting: provision new server with Ubuntu 20 +- [ ] DNS: add temporary domain (eg `prod2.openfoodnetwork.org`) + +### config +- [ ] Add temporary name to `inventory/hosts` +- [ ] Review `host_vars/x/config.yml`, clean up if needed + - Make a copy for the temp hostname, add temp domain to bottom of `certbot_domains` +- [ ] Review `ofn-secrets:x_prod/secrets.yml`, clean up if needed + - Change to shared bugsnag projects + - Don't bother making a copy of this one + +### setup +Ensure you have the correct secrets. +`ansible-playbook -l x_prod2 -e "@../ofn-secrets/x_prod/secrets.yml playbooks/` +- [ ] `letsencrypt_proxy.yml` +- [ ] `setup.yml` +- [ ] `provision.yml` +- [ ] `deploy.yml` +- [ ] `db_integrations` (Permit DB access for n8n, Metabase) +- [ ] Ensure sidekiq is disabled, to avoid creating subscription orders: `systemctl disable sidekiq` + +### initial migration +`ansible-playbook -l x_prod -e rsync_to=x_prod2 playbooks/` +- [ ] Setup direct ssh access for `ofn-admin` and `openfoodnetwork` as per guide +- [ ] `db_transfer.yml` +- [ ] `transfer_assets.yml` + +## 2. Testing + - [ ] test `reboot` + - [ ] send test mail (`/admin/mail_methods/edit`). + - [ ] terms of service file: `/admin/terms_of_service_files` + - [ ] shop catalogue display correctly, with images, add to cart, begin checkout, login + - note: check cookies if login won't work + - Check integrations + - [ ] Payments (check Stripe connect status `/admin/stripe_connect_settings/edit`) + - [ ] New Relic + - [ ] Bugsnag + +## 3. Migration +### preparation +- [ ] `deploy.yml -l x_prod2 -e "git_version=vX.Y.Z"` matching version with current prod +- [ ] old server: make a tiny data change to verify later (eg add `.` in meta description `/admin/general_settings/edit`) + +### switchover: old server +- [ ] 🚧 `maintenance_mode.yml` +- [ ] `sudo systemctl stop sidekiq redis-jobs` +- [ ] Transfer `/var/lib/redis-jobs/dump.rdb` to new server (see guide) +- [ ] `db_transfer.yml` ~3min +- [ ] `transfer_assets.yml` just in case +- [ ] `sudo systemctl stop puma postgres` (ensure other integrations no longer touch it) + +### switchover: new server +- [ ] `sudo systemctl restart puma; sudo systemctl start sidekiq redis-jobs` +- [ ] `Rails.cache.clear` (or migrate redis-cache/dump.rdb also) +- [ ] ⏭️ `temporary_proxy.yml -e 'proxy_target='` redirect traffic to new prod + * Note: this doesn't include webservices, and doesn't handle images. So it's a very short-term fix if at all. + * Use a `hosts` file entry to test a direct connection +- Check there are no alarm bells, eg: + - [ ] `~/apps/openfoodnetwork/current/logs/production.log` and `sidekiq.log` + - [ ] tiny data change is present. undo it. + - [ ] shopfront and checkout looks good + - [ ] upload a product image + - [ ] get confirmation from local team +- [ ] Update DNS to point to new server + +## 4. Cleanup (after 48hrs) +- [ ] check server access logs to verify no traffic +- [ ] shut down the old server, cancel old VPS +- [ ] remove DNS for temporary subdomain +- [ ] make sure the entries in ofn-install are up to date: remove the temporary entry made for the migration, and set the new IP address. +- [ ] validate that `provision.yml` still works. This will rename x-prod2 to x-prod +- [ ] check metabase sync if required: https://data.openfoodnetwork.org.uk/admin/databases/ +- [ ] check n8n +- [ ] check backups are functioning +- Update documentation: + * [ ] https://github.com/openfoodfoundation/ofn-install/wiki/Current-servers + * migration guide if necessary + + +## Rollback plan +* If an error occurs before the temporary proxy is active, and can't be resolved quickly, then restore service back to current server +* If an error occurs after proxy is active, users may have interacted with the new server (eg made payments. + * if serious, consider putting into maintenance mode (and stop sidekiq) to avoid further changes + * otherwise seek to resolve issue in-place. From 49b3b4f4818b58de66cd4b24591a865c6fec7f20 Mon Sep 17 00:00:00 2001 From: David Cook Date: Mon, 4 Nov 2024 11:41:30 +1100 Subject: [PATCH 2/4] Updates from code review --- .github/ISSUE_TEMPLATE/server-migration.md | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/.github/ISSUE_TEMPLATE/server-migration.md b/.github/ISSUE_TEMPLATE/server-migration.md index 0e96c42b0..c1098bedf 100644 --- a/.github/ISSUE_TEMPLATE/server-migration.md +++ b/.github/ISSUE_TEMPLATE/server-migration.md @@ -23,16 +23,20 @@ Checklist based on general guide https://github.com/openfoodfoundation/ofn-insta - Don't bother making a copy of this one ### setup -Ensure you have the correct secrets. +Enable passthrough on _current_ server to allow new server to generate a certificate: +- [ ] `ansible-playbook playbooks/letsencrypt_proxy.yml -l x_prod -e "proxy_target=" ` + +Then setup new server. Ensure you have the correct secrets (current secrets are usually fine). `ansible-playbook -l x_prod2 -e "@../ofn-secrets/x_prod/secrets.yml playbooks/` -- [ ] `letsencrypt_proxy.yml` - [ ] `setup.yml` - [ ] `provision.yml` - [ ] `deploy.yml` - [ ] `db_integrations` (Permit DB access for n8n, Metabase) -- [ ] Ensure sidekiq is disabled, to avoid creating subscription orders: `systemctl disable sidekiq` ### initial migration +- [ ] Ensure sidekiq is disabled, to avoid creating subscription orders when data is loaded: + `sudo systemctl stop sidekiq && sudo systemctl disable sidekiq` + `ansible-playbook -l x_prod -e rsync_to=x_prod2 playbooks/` - [ ] Setup direct ssh access for `ofn-admin` and `openfoodnetwork` as per guide - [ ] `db_transfer.yml` @@ -56,11 +60,11 @@ Ensure you have the correct secrets. ### switchover: old server - [ ] 🚧 `maintenance_mode.yml` -- [ ] `sudo systemctl stop sidekiq redis-jobs` +- [ ] `sudo systemctl stop sidekiq redis-jobs puma` - [ ] Transfer `/var/lib/redis-jobs/dump.rdb` to new server (see guide) - [ ] `db_transfer.yml` ~3min +- [ ] `sudo systemctl stop postgres` (ensure other integrations no longer touch it) - [ ] `transfer_assets.yml` just in case -- [ ] `sudo systemctl stop puma postgres` (ensure other integrations no longer touch it) ### switchover: new server - [ ] `sudo systemctl restart puma; sudo systemctl start sidekiq redis-jobs` @@ -87,7 +91,7 @@ Ensure you have the correct secrets. - [ ] check backups are functioning - Update documentation: * [ ] https://github.com/openfoodfoundation/ofn-install/wiki/Current-servers - * migration guide if necessary + * [ ] This migration guide if necessary ## Rollback plan From 3423e1f686c4dc45db07bfc585b5a893d1945653 Mon Sep 17 00:00:00 2001 From: David Cook Date: Mon, 4 Nov 2024 11:50:15 +1100 Subject: [PATCH 3/4] Reset db before deployment I have to admit I'm a bit scared of running db:reset, will need to double-check I'm definitely on the right server.. --- .github/ISSUE_TEMPLATE/server-migration.md | 1 + 1 file changed, 1 insertion(+) diff --git a/.github/ISSUE_TEMPLATE/server-migration.md b/.github/ISSUE_TEMPLATE/server-migration.md index c1098bedf..3c830ee00 100644 --- a/.github/ISSUE_TEMPLATE/server-migration.md +++ b/.github/ISSUE_TEMPLATE/server-migration.md @@ -55,6 +55,7 @@ Then setup new server. Ensure you have the correct secrets (current secrets are ## 3. Migration ### preparation +- [ ] new server: `bundle exec rake db:reset -e production` - [ ] `deploy.yml -l x_prod2 -e "git_version=vX.Y.Z"` matching version with current prod - [ ] old server: make a tiny data change to verify later (eg add `.` in meta description `/admin/general_settings/edit`) From f12d47fafe2660b1311a017beee80e81f217f80c Mon Sep 17 00:00:00 2001 From: David Cook Date: Wed, 6 Nov 2024 12:24:00 +1100 Subject: [PATCH 4/4] Small tweaks --- .github/ISSUE_TEMPLATE/server-migration.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/ISSUE_TEMPLATE/server-migration.md b/.github/ISSUE_TEMPLATE/server-migration.md index 3c830ee00..dc00dca45 100644 --- a/.github/ISSUE_TEMPLATE/server-migration.md +++ b/.github/ISSUE_TEMPLATE/server-migration.md @@ -24,7 +24,7 @@ Checklist based on general guide https://github.com/openfoodfoundation/ofn-insta ### setup Enable passthrough on _current_ server to allow new server to generate a certificate: -- [ ] `ansible-playbook playbooks/letsencrypt_proxy.yml -l x_prod -e "proxy_target=" ` +- [ ] `ansible-playbook playbooks/letsencrypt_proxy.yml -l x_prod -e "proxy_target=" ` Then setup new server. Ensure you have the correct secrets (current secrets are usually fine). `ansible-playbook -l x_prod2 -e "@../ofn-secrets/x_prod/secrets.yml playbooks/` @@ -36,9 +36,9 @@ Then setup new server. Ensure you have the correct secrets (current secrets are ### initial migration - [ ] Ensure sidekiq is disabled, to avoid creating subscription orders when data is loaded: `sudo systemctl stop sidekiq && sudo systemctl disable sidekiq` +- [ ] Setup direct ssh access for `ofn-admin` and `openfoodnetwork` as per guide `ansible-playbook -l x_prod -e rsync_to=x_prod2 playbooks/` -- [ ] Setup direct ssh access for `ofn-admin` and `openfoodnetwork` as per guide - [ ] `db_transfer.yml` - [ ] `transfer_assets.yml`