Skip to content

Commit c722a32

Browse files
committed
WIP
1 parent 94f00da commit c722a32

File tree

2 files changed

+81
-141
lines changed

2 files changed

+81
-141
lines changed

doc/how_to_upgrade_persistent_instances.rst

Lines changed: 79 additions & 139 deletions
Original file line numberDiff line numberDiff line change
@@ -11,152 +11,85 @@ How to upgrade persistent instances (Amazon AWS)
1111
This article describes how to upgrade persistent instances (e.g. copr-fe-dev) to
1212
a new Fedora version.
1313

14+
TODO: schedule outage.
1415

1516
Requirements
1617
============
1718

18-
* access to `Amazon AWS`_
19-
* ssh access to batcave01
20-
* permissions to update aws.fedoraproject.org DNS records
21-
19+
* access to the team's `Amazon AWS account`_, and having that account properly
20+
configured according to the `README.md <helper playbook repository_>`_
21+
* permissions to run playbooks on `batcave01 <playbook SOP_>`_
2222

2323

2424
Pre-upgrade
2525
===========
2626

27-
The goal is to do as much work pre-upgrade as possible while focusing
28-
only on important things and not creating a work overload with tasks,
29-
that can be done post-upgrade.
27+
The goal is to do as much work pre-upgrade as possible, while focusing on
28+
as short **outage window** as possible, and still doing only important things
29+
(and not creating a work that can be done post-upgrade).
3030

3131
Don't do the pre-upgrade too long before the actual upgrade. Ideally a couple of
3232
hours or a day before.
3333

3434

35-
Launch a new instance
36-
---------------------
37-
38-
First, login into `Amazon AWS`_, otherwise the following step will not
39-
work. Once you are logged-in, feel free to close the page.
40-
41-
42-
1. Choose AMI
43-
.............
44-
45-
Navigate to the `Cloud Base Images`_ download page and scroll down to
46-
the section with cloud base images for Amazon public cloud. Use
47-
``Click to launch`` button to launch an instance from the x86_64
48-
AMI. Select the US East (N. Virginia) region.
49-
50-
You will get redirected to the Amazon AWS page.
51-
52-
53-
2. Name and tags
54-
................
55-
56-
- Set ``Name`` and add ``-new`` suffix (e.g. ``copr-distgit-dev-new``
57-
or ``copr-distgit-prod-new``)
58-
- Set ``CoprInstance`` to ``devel`` or ``production``
59-
- Set ``CoprPurpose`` to ``infrastructure``
60-
- Set ``FedoraGroup`` to ``copr``
61-
62-
63-
3. Application and OS Images (Amazon Machine Image)
64-
...................................................
65-
66-
Skip this section, we already chose the correct AMI from the Fedora
67-
website.
68-
69-
70-
4. Instance type
71-
................
72-
73-
Currently, we use the following instance types:
74-
75-
+----------------+-------------+-------------+
76-
| | Dev | Production |
77-
+================+=============+=============+
78-
| **frontend** | t3a.medium | t3a.xlarge |
79-
+----------------+-------------+-------------+
80-
| **backend** | t3a.medium | m5a.4xlarge |
81-
+----------------+-------------+-------------+
82-
| **keygen** | t3a.small | t3a.xlarge |
83-
+----------------+-------------+-------------+
84-
| **distgit** | t3a.medium | t3a.medium |
85-
+----------------+-------------+-------------+
86-
| **pulp** | t3a.medium | TODO |
87-
+----------------+-------------+-------------+
88-
89-
When more power is needed, please use the `ec2instances.info`_ comparator to get
90-
the cheapest available instance type according to our needs.
91-
92-
93-
5. Key pair (login)
94-
...................
95-
96-
- Make sure to use existing key pair named ``Ansible Key``. This allows us to
97-
run the playbooks on ``batcave01`` box against the newly spawned VM.
98-
99-
100-
6. Network settings
101-
...................
102-
103-
- Click the ``Edit`` button in the box heading to show more options
104-
- Select VPC ``vpc-0af***********972``
105-
- Select ``Subnet`` to be ``us-east-1c``
106-
- Switch ``Auto-assign IPv6 IP`` to ``Enable``
107-
- Switch to ``Select existing security group`` and pick one of
108-
109-
- ``copr-frontend-sg``
110-
- ``copr-backend-sg``
111-
- ``copr-distgit-sg``
112-
- ``copr-keygen-sg``
113-
- ``copr-pulp-sg``
114-
115-
116-
7. Configure storage
117-
....................
118-
119-
- Click the ``Advanced`` button in the box heading to show more options
120-
- Update the ``Size (GiB)`` of the root partition
121-
122-
+----------------+-------------+-------------+
123-
| | Dev | Production |
124-
+================+=============+=============+
125-
| **frontend** | 50G | 50G |
126-
+----------------+-------------+-------------+
127-
| **backend** | 20G | 100G |
128-
+----------------+-------------+-------------+
129-
| **keygen** | 10G | 20G |
130-
+----------------+-------------+-------------+
131-
| **distgit** | 20G | 80G |
132-
+----------------+-------------+-------------+
133-
| **pulp** | 20G | TODO |
134-
+----------------+-------------+-------------+
135-
136-
- Turn on the ``Encrypted`` option
137-
- Select ``KMS key`` to whatever is ``(default)``
138-
139-
140-
8. Advanced details
141-
...................
142-
143-
- ``Termination protection`` - ``Enable``
144-
35+
Preparation
36+
-----------
14537

146-
9. Launch instance
147-
..................
38+
Make sure you have the `helper playbook repository`_ cloned locally, step into
39+
the clone directory.
40+
41+
At this point, please review ``dev.yml``, ``prod.yml`` and ``all.yml``
42+
configuration in the ``./group_vars`` directory. Namely review all the
43+
``old_instance_id``, ``old_network_id`` and data volume IDs, **these REALLY NEED
44+
to match EC2 reality!**
45+
46+
You are going to run these playbooks on your machine::
47+
48+
play-vm-migration-01-new-box.yml
49+
play-vm-migration-02-migrate-backend-box.yml
50+
play-vm-migration-02-migrate-non-backend-box.yml
51+
play-vm-migration-03-rename-instances.yml
52+
53+
While doing so, you will have to specify two Ansible variables explicitly,
54+
``copr_instance`` (to either ``dev`` or ``prod`` string) and ``server_id`` (to
55+
one of ``frontend``, ``backend``, ``distgit`` or ``keygen``). Example command
56+
will look like::
57+
58+
$ opts=( -e copr_instance=dev -e server_id=keygen )
59+
$ ansible-playbook play-vm-migration-01-new-box.yml "${opts[@]}"
60+
61+
Please realize AMI (golden images) you want to use when starting new instances,
62+
we typically upgrade to ``Fedora N+2``, e.g. we migrate the infrastructure from
63+
Fedora 37 to Fedora 39. Navigate to the `Cloud Base Images`_ download page, see
64+
the section for **Intel and AMD x86_64 systems**, click the button next to the
65+
**Fedora Cloud 39 AWS** column (JavaScript needs to be enabled!). Note the
66+
``ami-*`` ID in the **US East (N. Virginia)** region (e.g.
67+
``ami-0746fc234df9c1ee0``). This ``ami-*`` needs to be specified in
68+
``group_vars/all.yml``, and both ``group_vars/{dev,prod}.yml``
69+
need to correctly refer it.
70+
71+
You can double check other machine parameters like instance types (when more
72+
power is needed, please use the `ec2instances.info`_ comparator to get
73+
the cheapest available instance type according to our needs), naming, tags, IP
74+
addresses, root volume sizes, etc. But typically, the defaults will be good
75+
as-is.
14876

149-
Click ``Launch instance`` in the right panel.
77+
.. note::
78+
The ``group_vars/`` directory is the ultimate source of thruth for the Fedora
79+
Copr instance, so please update the configuration later anytime you change
80+
the instance parameters.
15081

82+
Make sure to use the existing key pair named ``Ansible Key``. This allows us to
83+
**first** run the playbooks on ``batcave01`` box against the newly spawned VM
84+
(the playbook then enables the Fedora Copr team members to ssh using their own
85+
keys, as uploaded to FAS).
15186

152-
Add names for the root volumes
153-
------------------------------
87+
Launch new instances
88+
--------------------
15489

155-
Once the instance is created, go to its details, switch to the
156-
``Storage`` tab, and go through all attached volumes. Set the ``Name``
157-
tag for each of them. Use the name of the instance as a prefix, e.g.
158-
``copr-keygen-dev-root``, ``copr-frontend-prod-root``, etc.
90+
This should be as simple as::
15991

92+
$ ansible-playbook play-vm-migration-01-new-box.yml "${opts[@]}"
16093

16194
Backup the current letsencrypt certificates
16295
-------------------------------------------
@@ -170,21 +103,26 @@ Copy the certificate files by running the playbooks **against the current (old)
170103
copr stack** (all machines). There's the ``-t certbot`` ansible tag that allows
171104
you to speedup the playbook runs.
172105

173-
174-
Pre-prepare the new VM
175-
----------------------
106+
Pre-prepare the new VM - backend only!
107+
--------------------------------------
176108

177109
.. note::
178110

179-
Backend - It's possible to run the playbook against the new copr-backend
180-
server before we actually shut-down the old one. But to make sure that
181-
ansible won't complain, we need
111+
It's possible to run the playbook against the new copr-backend server before
112+
we actually shut the old one down. But to make sure that ansible won't
113+
complain, we need
114+
115+
- A temporary volume attached to the new box providing an ext4 filesystem
116+
with ``copr-repo`` label.
117+
118+
- An existing temporary hostname (with existing DNS record) to execute the
119+
playbook against it.
120+
121+
The Volume, DNS record and a corresponding Elastic IP for this purpose is
122+
already prepared. The ``play-vm-migration-01-new-box.yml`` playbook should
123+
already make them available.
124+
182125

183-
- A volume attached to the new box with label 'copr-repo'. Use already
184-
existing volume named ``data-copr-be-dev-initial-playbook-run``
185-
- An existing complementary DNS record (``copr-be-temp`` or
186-
``copr-be-dev-temp``). poiting to the non-elastic IP of the new
187-
server. See the `DNS SOP`_.
188126

189127

190128
Note the private IP addresses
@@ -514,7 +452,9 @@ Close the infrastructure ticket, the upgrade is done.
514452
.. _`Fedora infrastructure issue #7966`: https://pagure.io/fedora-infrastructure/issue/7966
515453
.. _`fedora devel`: https://lists.fedorahosted.org/archives/list/devel@lists.fedoraproject.org/
516454
.. _`copr devel`: https://lists.fedoraproject.org/archives/list/copr-devel@lists.fedorahosted.org/
517-
.. _`Amazon AWS`: https://id.fedoraproject.org/saml2/SSO/Redirect?SPIdentifier=urn:amazon:webservices&RelayState=https://console.aws.amazon.com
518-
.. _`Cloud Base Images`: https://alt.fedoraproject.org/cloud/
455+
.. _`Amazon AWS account`: https://id.fedoraproject.org/saml2/SSO/Redirect?SPIdentifier=urn:amazon:webservices&RelayState=https://console.aws.amazon.com
456+
.. _`Cloud Base Images`: https://fedoraproject.org/cloud/download/
519457
.. _`DNS SOP`: https://docs.fedoraproject.org/en-US/infra/sysadmin_guide/dns/
520458
.. _`ec2instances.info`: https://ec2instances.info/
459+
.. _`helper playbook repository`: https://github.com/fedora-copr/ansible-fedora-copr
460+
.. _`playbook SOP`: https://docs.fedoraproject.org/en-US/infra/sysadmin_guide/ansible/

doc/raid_on_backend.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ Adding more space
5454
1. Create two ``gp3`` volumes in EC2 of the same size and type, tag them with
5555
``FedoraGroup: copr``, ``CoprInstance: production``, ``CoprPurpose:
5656
infrastructure``. Attach them to a freshly started temporary instance (we
57-
don't want to overload I/O with the `initial RAID sync <mdadm_sync>`_ on
57+
don't want to overload I/O with the :ref:`initial RAID sync <mdadm_sync>` on
5858
production backend). Make sure the instance type has enough EBS throughput
5959
to perform the initial sync quickly enough.
6060

@@ -68,7 +68,7 @@ Adding more space
6868

6969
$ mdadm --create --name=raid-be-03 --verbose /dev/mdXYZ --level=1 --raid-devices=2 /dev/nvmeXn1p1 /dev/nvmeYn1p1
7070

71-
Wait till the new empty `array is synchronized <mdadm_sync>`_ (may take hours
71+
Wait till the new empty :ref:`array is synchronized <mdadm_sync>` (may take hours
7272
or days, note we sync 2x16T). Check the details with ``mdadm -Db
7373
/dev/md/raid-be-03``. See the tips bellow how to make the sync speed
7474
unlimited with ``sysctl``.

0 commit comments

Comments
 (0)