diff --git a/doc/database_backup.rst b/doc/database_backup.rst index e6bf430f8..d14eb6998 100644 --- a/doc/database_backup.rst +++ b/doc/database_backup.rst @@ -1,18 +1,50 @@ .. _database_backup: -Database backup -=============== +Database backups +================ -We periodically create a databse dump and offer users to download -it. At the same time, it can be used as a database backup if something -wrong happens. Please see ``/etc/cron.d/cron-backup-database-coprdb``. +We periodically create two kinds of database dumps. -To backup the database manually (this can be useful e.g. before -upgrading to a new major version of PostgreSQL), run:: +Private/backup dump +------------------- + +This "complete" dump is done for potential disaster-recovery situations. It +contains all the data (including private stuff like API tokens), and therefore +we **never publish it or download it onto our machines**. The dump is created in +the ``/backups/`` directory on Copr Frontend, and it is periodically pulled by +a rdiff-backup Fedora Infrastructure bot `configured by Ansible +`_. + +To generate the backup manually (this can be useful e.g. before upgrading to a +new major version of PostgreSQL), run:: [root@copr-fe ~][PROD]# su - postgres bash-5.0$ /usr/local/bin/backup-database coprdb -Please be aware that the script does ``sleep`` for some -undeterministic amount of time. You might want to kill the ``sleep`` -process to speed it up a little. +.. warning:: + + Please be aware that the script does an initial ``sleep`` for some + undeterministic amount of time (to not backup all the Fedora Infra databases + at the same time). You might want to kill the ``sleep`` process to speed it + up a little. Still, be prepared that the dump, mostly because of the XZ + compression, takes more than 20 minutes! + +.. warning:: + + If you run this manually to have the :ref:`last-minute pre-upgrade dump + `, you need to **keep the machine + running** till the upgrade is done — to keep the ``/backups`` directory + existing! + +Public dumps +------------ + +These dumps are `publicly available +`_ for anyone's experiments. +These are generated overnight via:: + + /etc/cron.d/cron-backup-database-coprdb + +Those dumps have all the private data filtered out (namely the contents of +``_private`` tables), but still usable as-is for debugging purposes (e.g. +spawning a testing Copr Frontend container with pre-generated database). diff --git a/doc/how_to_release_copr.rst b/doc/how_to_release_copr.rst index 4f99e3614..4cf384ccd 100644 --- a/doc/how_to_release_copr.rst +++ b/doc/how_to_release_copr.rst @@ -93,7 +93,7 @@ Check that .repo files correctly points to ``@copr/copr``. And run on batcave01. .. note:: If there is a new version of copr-rpmbuild, follow the - :ref:`terminate_os_vms` and :ref:`terminate_resalloc_vms` instructions. + :ref:`terminate_resalloc_vms` instructions. Make sure expected versions of Copr packages are installed on the dev instances:: @@ -215,31 +215,8 @@ notes against Copr git repository. Schedule and announce the outage ................................ -.. warning:: - - Schedule outage even if it has to happen in the next 5 minutes! - -Get faimiliar with the `Fedora Outage SOP `_. -In general, please follow these steps: - -1. Prepare the infrastructure ticket similar to `this old one `_. - -2. Send email to `copr-devel`_ mailing list informing about an upcomming - release. We usually copy-paste text of the infrastructure ticket created in a - previous step. Don't forget to put a link to the ticket at the end of the - email. See the `example `_. - -3. Send ``op #fedora-buildsys MyIrcNick`` message to ``ChanServ`` on - libera.chat to get the OP rights, and then adjust the channel title so it - starts with message similar to:: - - Planned outage 2022-08-17 20:00 UTC - https://pagure.io/fedora-infrastructure/issue/10854 - -4. Create a new "planned" `Fedora Status SOP`_ entry. -5. Create warning banner on Copr homepage:: - - copr-frontend warning-banner --outage_time "2022-12-31 13:00-16:00 UTC" --ticket 1234 - +See a specific document :ref:`announcing_fedora_copr_outage`, namely the +"planned" outage state. Release window -------------- @@ -248,16 +225,10 @@ If all the pre-release preparations were done meticulously and everything was tested properly, the release window shouldn't take more than ten minutes. That is, if nothing goes terribly sideways... - Let users know -------------- -1. Change the "planned" `Fedora Status SOP`_ entry into an "ongoing" entry. - -2. Announce on ``#fedora-buildsys``, change title like - ``s/Planned outage ../OUTAGE NOW .../`` and send some message like - ``WARNING: The scheduled outage just begings!``. - +See :ref:`announcing_fedora_copr_outage` again, ad "ongoning" issue. Production infra tags --------------------- @@ -371,24 +342,8 @@ If schema was modified you should generate new Schema documentation. Announce the end of the release ............................... -1. Remove the "Outage" note from the ``#fedora-buildsys`` title. - -2. Send a message on ``fedora-buildsys`` that the outage is over! - -3. Send email to `copr-devel`_ mailing list. If there is some important change - you can send email to fedora devel mailing list too. Mention the link to the - "Highlights from XXXX-XX-XX release" documentation page. - -4. Propose a new "highlights" post for the `Fedora Copr Blog`_, - see `the example - `_. - -5. Close the Fedora Infra ticket. - -6. Change the "ongoing" `Fedora Status SOP`_ entry into a "resolved" one. - -7. Remove the warning banner from frontend page using - ``copr-frontend warning-banner --remove`` +See a specific document :ref:`announcing_fedora_copr_outage`, the "resolved" +section. Release packages to PyPI @@ -446,6 +401,4 @@ Fix this document to make it easy for the release nanny of the next release to u .. _`Copr release directory`: https://releases.pagure.org/copr/copr .. _`copr-devel`: https://lists.fedoraproject.org/archives/list/copr-devel@lists.fedorahosted.org/ -.. _`Fedora Status SOP`: https://docs.fedoraproject.org/en-US/infra/sysadmin_guide/status-fedora/ .. _`example stg infra repo`: https://kojipkgs.fedoraproject.org/repos-dist/f36-infra-stg/ -.. _`Fedora Copr Blog`: https://fedora-copr.github.io/ diff --git a/doc/how_to_upgrade_builders.rst b/doc/how_to_upgrade_builders.rst index ea591cb59..c2924c2af 100644 --- a/doc/how_to_upgrade_builders.rst +++ b/doc/how_to_upgrade_builders.rst @@ -8,7 +8,6 @@ This article explains how to upgrade the Copr builders images in - :ref:`AWS ` (x86_64 and aarch64), - :ref:`LibVirt/OpenStack ` (x86_64 and ppc64le), - :ref:`IBM Cloud ` (s390x), -- |ss| :ref:`We currently don't work with OpenStack separately ` |se|. This HOWTO is useful for upgrading images to a newer Fedora release, or for just updating all the packages contained within the builder images. This image diff --git a/doc/how_to_upgrade_builders_openstack.rst b/doc/how_to_upgrade_builders_openstack.rst deleted file mode 100644 index 646bcdad1..000000000 --- a/doc/how_to_upgrade_builders_openstack.rst +++ /dev/null @@ -1,113 +0,0 @@ -.. _how_to_upgrade_builders_openstack: - -.. note:: There's currently no OpenStack instance in Fedora infrastructure, so - this documentation exists for the historical reference (we might get - another OpenStack instance in the future). - - -Prepare OpenStack source images -------------------------------- - -(x86_64 and ppc64le architectures) - -For OpenStack, there is an image registry on `OpenStack images dashboard`_. By -default you see only the project images; to see all of them, click on the -``Public`` button. - -Search for the ``Fedora-Cloud-Base-*`` images of the particular Fedora. Are -both x86_64 and ppc64le images available? Then you can jump right to the next -section. - -Download the image, and upload it to the infra OpenStack. Be careful to keep -sane ``Fedora-Cloud-Base*`` naming, and to make it public, so others can later -use it as well: - -:: - - $ wget - .. downloaded Fedora-Cloud-Base-30-1.2.x86_64.qcow2 .. - $ source - # hw_rng_model=virtio is needed to guarantee enough entropy on VMs - # --public is needed to publish it to everyone - # --protected so other openstack users can not delete it - $ openstack image create \ - --file Fedora-Cloud-Base-30-1.2.x86_64.qcow2 \ - --public \ - --protected \ - --disk-format qcow2 \ - --container-format bare \ - --property architecture=x86_64 \ - --property hw_rng_model=virtio \ - Fedora-Cloud-Base-30-1.2.x86_64 - -Note also the ``--property hw_rng_model=virtio`` option which guarantees that -the VMs won't wait indefinitely for random seed. - - -Prepare VM for snapshot -^^^^^^^^^^^^^^^^^^^^^^^ - -Open a ssh connection to ``copr-be-dev.cloud.fedoraproject.org`` and run:: - - # su - copr - $ copr-builder-image-prepare-cloud.sh os:x86_64 Fedora-Cloud-Base-30-1.2.x86_64 # or ppc64le - ... snip ... - TASK [disable offloading] ***************************************************** - Wednesday 14 August 2019 13:31:27 +0000 (0:00:05.603) 0:03:47.402 ****** - changed: [172.25.150.72] - ... snip .... - -It can fail (for various reasons, missing packages, changes in Fedora, etc.). -But after running the script, you will get an IP address of a spawned builder. -You can ssh into that builder, make changes and try to debug. Then, knowing -where the problem is - fix the following playbook files:: - - /home/copr/provision/provision_builder_tasks.yml - /home/copr/provision/builderpb_nova.yml - /home/copr/provision/builderpb_nova_ppc64le.yml - -Repeat the fixing of playbooks till the script finishes properly:: - - $ copr-builder-image-prepare-cloud.sh os:x86_64 Fedora-Cloud-Base-30-1.2.x86_64 - ... see the output instructions ... - TASK [disable offloading] ***************************************************** - Wednesday 14 August 2019 13:31:27 +0000 (0:00:05.603) 0:03:47.402 ****** - changed: [172.25.150.72] - ... snip .... - Request to stop server Copr_builder_20901443 has been accepted. - Please go to https://fedorainfracloud.org/ page, log-in and find the instance - - Copr_builder_20901443 - - Check that it is in SHUTOFF state. Create a snapshot from that instance, name - it "copr-builder-x86_64-f30-20190814_133128". Once snapshot is saved, run: - - $ copr-builder-image-fixup-snapshot-os.sh copr-builder-x86_64-f30-20190814_133128 - - And continue with - https://docs.pagure.org/copr.copr/how_to_upgrade_builders.html#how-to-upgrade-builders - -Once done, continue with the manual steps from the instructions on the -command-line output (create image snapshot and run the -``copr-builder-image-fixup-snapshot-os.sh`` script). Those manual steps could be done -automatically, but `Fedora Infra OpenStack`_ refuses snapshot API requests for -some reason. - - -Finishing up OpenStack images -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -Since you have a new image name(s) which can be used on builders, you can -configure ``copr_builder_images`` option in -``/home/copr/provision/nova_cloud_vars.yml`` variable file. Since now, the -**development** backend should spawn from new image. You can try to kill all -the old builders, and check the spawner log what is happening:: - - [copr@copr-be-dev ~][STG]$ cleanup_vm_nova.py --kill-also-unused - [copr@copr-be-dev ~][STG]$ tail -f /var/log/copr-backend/spawner.log - -Try to build some packages and you are done. - - -.. _`OpenStack images dashboard`: https://fedorainfracloud.org/dashboard/project/images/ -.. _`Fedora Infra OpenStack`: https://fedorainfracloud.org diff --git a/doc/how_to_upgrade_persistent_instances.rst b/doc/how_to_upgrade_persistent_instances.rst index 4e4bc0134..951411c11 100644 --- a/doc/how_to_upgrade_persistent_instances.rst +++ b/doc/how_to_upgrade_persistent_instances.rst @@ -1,512 +1,294 @@ .. _how_to_upgrade_persistent_instances: .. _how_to_upgrade_persistent_instances_aws: -How to upgrade persistent instances (Amazon AWS) -************************************************ - -.. note:: - This document is specific to Amazon AWS. For OpenStack, see - :ref:`this outdated one `. - -This article describes how to upgrade persistent instances (e.g. copr-fe-dev) to -a new Fedora version. +How to Upgrade Fedora Copr Persistent VMs (Amazon AWS) +****************************************************** +This document describes the process of upgrading persistent VM instance(s) +(e.g., ``copr-fe-dev.aws.fedoraproject.org``) to a new Fedora version by +creating a completely new VM to replace the old one. Requirements ============ -* access to `Amazon AWS`_ -* ssh access to batcave01 -* permissions to update aws.fedoraproject.org DNS records - - +* Access to the team's `Amazon AWS account`_ and proper configuration of that account according to the `README.md `_. +* Permissions to run playbooks on `batcave01 `_. +* Since we do not modify the public IPs (neither v4 nor v6), no DNS + modifications should be required. However, familiarize yourself with the `DNS + SOP`_ in case of any issues. Pre-upgrade =========== -The goal is to do as much work pre-upgrade as possible while focusing -only on important things and not creating a work overload with tasks, -that can be done post-upgrade. - -Don't do the pre-upgrade too long before the actual upgrade. Ideally a couple of -hours or a day before. - - -Launch a new instance ---------------------- - -First, login into `Amazon AWS`_, otherwise the following step will not -work. Once you are logged-in, feel free to close the page. - - -1. Choose AMI -............. - -Navigate to the `Cloud Base Images`_ download page and scroll down to -the section with cloud base images for Amazon public cloud. Use -``Click to launch`` button to launch an instance from the x86_64 -AMI. Select the US East (N. Virginia) region. - -You will get redirected to the Amazon AWS page. - - -2. Name and tags -................ - -- Set ``Name`` and add ``-new`` suffix (e.g. ``copr-distgit-dev-new`` - or ``copr-distgit-prod-new``) -- Set ``CoprInstance`` to ``devel`` or ``production`` -- Set ``CoprPurpose`` to ``infrastructure`` -- Set ``FedoraGroup`` to ``copr`` - - -3. Application and OS Images (Amazon Machine Image) -................................................... - -Skip this section, we already chose the correct AMI from the Fedora -website. - - -4. Instance type -................ - -Currently, we use the following instance types: - -+----------------+-------------+-------------+ -| | Dev | Production | -+================+=============+=============+ -| **frontend** | t3a.medium | t3a.xlarge | -+----------------+-------------+-------------+ -| **backend** | t3a.medium | m5a.4xlarge | -+----------------+-------------+-------------+ -| **keygen** | t3a.small | t3a.xlarge | -+----------------+-------------+-------------+ -| **distgit** | t3a.medium | t3a.medium | -+----------------+-------------+-------------+ -| **pulp** | t3a.medium | TODO | -+----------------+-------------+-------------+ - -When more power is needed, please use the `ec2instances.info`_ comparator to get -the cheapest available instance type according to our needs. - - -5. Key pair (login) -................... - -- Make sure to use existing key pair named ``Ansible Key``. This allows us to - run the playbooks on ``batcave01`` box against the newly spawned VM. +The goal is to complete as much pre-upgrade work as possible while focusing on +minimizing the **outage window** and only performing essential tasks that cannot +be done post-upgrade. +Avoid conducting the pre-upgrade too far in advance of the actual upgrade. +Ideally, perform this phase a couple of hours or a day before. -6. Network settings -................... +Announce the outage +------------------- -- Click the ``Edit`` button in the box heading to show more options -- Select VPC ``vpc-0af***********972`` -- Select ``Subnet`` to be ``us-east-1c`` -- Switch ``Auto-assign IPv6 IP`` to ``Enable`` -- Switch to ``Select existing security group`` and pick one of +See a specific document :ref:`announcing_fedora_copr_outage`, namely the +"planned" outage state. - - ``copr-frontend-sg`` - - ``copr-backend-sg`` - - ``copr-distgit-sg`` - - ``copr-keygen-sg`` - - ``copr-pulp-sg`` - - -7. Configure storage -.................... - -- Click the ``Advanced`` button in the box heading to show more options -- Update the ``Size (GiB)`` of the root partition - -+----------------+-------------+-------------+ -| | Dev | Production | -+================+=============+=============+ -| **frontend** | 50G | 50G | -+----------------+-------------+-------------+ -| **backend** | 20G | 100G | -+----------------+-------------+-------------+ -| **keygen** | 10G | 20G | -+----------------+-------------+-------------+ -| **distgit** | 20G | 80G | -+----------------+-------------+-------------+ -| **pulp** | 20G | TODO | -+----------------+-------------+-------------+ +Preparation +----------- -- Turn on the ``Encrypted`` option -- Select ``KMS key`` to whatever is ``(default)`` +Ensure you have the `helper playbook repository`_ cloned locally and navigate to +the clone directory. +Review the ``dev.yml``, ``prod.yml``, and ``all.yml`` configurations in the +``./group_vars`` directory. Pay particular attention to the ``old_instance_id``, +``old_network_id``, and data volume IDs as **these MUST match the EC2 reality**. -8. Advanced details -................... +In the following moments, you will run several playbooks on your machine. +During execution, explicitly specify two Ansible variables, ``copr_instance`` +(set to either ``dev`` or ``prod``) and ``server_id`` (set to either +``frontend``, ``backend``, ``distgit``, or ``keygen``). For example:: -- ``Termination protection`` - ``Enable`` + $ opts=( -e copr_instance=dev -e server_id=keygen ) + $ ansible-playbook play-vm-migration-01-new-box.yml "${opts[@]}" +Identify the AMI (golden images) you want to use for the new VM instances. +Typically, upgrade to ``Fedora N+2`` (e.g., migrating infrastructure from Fedora +37 to Fedora 39). Visit the `Cloud Base Images`_ download page, locate the +**Intel and AMD x86_64 systems** section, and click the button next to +**Fedora Cloud 39 AWS** (ensure JavaScript is enabled for this page!). +Note the ``ami-*`` ID in the **US East (N. Virginia)** region (for example +``ami-0746fc234df9c1ee0``). Specify this ``ami-*`` ID in +``group_vars/all.yml``, and ensure both ``group_vars/{dev,prod}.yml`` correctly +reference it. -9. Launch instance -.................. +Double-check other machine parameters such as instance types, names, tags, IP +addresses, root volume sizes, etc. Usually, the pre-filled defaults suffice, +but verification is recommended. -Click ``Launch instance`` in the right panel. +Use the `ec2instances.info`_ comparator to find the cheapest available instance +type that meets our needs whenever more power is required. +.. warning:: -Add names for the root volumes ------------------------------- + The ``group_vars/`` directory serves as the primary source of truth for the + Fedora Copr instances. Update the configuration in this directory whenever + you ad-hoc modify some EC2 instance parameters in the future! -Once the instance is created, go to its details, switch to the -``Storage`` tab, and go through all attached volumes. Set the ``Name`` -tag for each of them. Use the name of the instance as a prefix, e.g. -``copr-keygen-dev-root``, ``copr-frontend-prod-root``, etc. +Key pair named ``Ansible Key`` must be used. This allows us +to initially run the playbooks from ``batcave01`` box against the newly +spawned VM. The playbooks assure that, subsequently, Fedora Copr team members +can SSH using their own keys, uploaded to FAS. +Backup the Current Let's Encrypt Certificates +--------------------------------------------- -Backup the current letsencrypt certificates -------------------------------------------- +We will copy and paste the certificate files used on the old set of VMs onto the +new VMs. These certificates will remain in use until automatically renewed by +the certbot daemon. The process begins by copying the certificate files to the +``batcave01`` through the execution of playbooks with the ``-t certbot`` option. +For instance:: -The certificates files used on the old set of VMs need to be copy-pasted onto -the new set of VMs (at least initially, till they are automatically re-newed by -the certbot daemon). For this, we need to copy the certificate files to the -batcave server first. + $ sudo rbac-playbook -l copr-keygen.aws.fedoraproject.org groups/copr-keygen.yml -t certbot -Copy the certificate files by running the playbooks **against the current (old) -copr stack** (all machines). There's the ``-t certbot`` ansible tag that allows -you to speedup the playbook runs. +Do this for all the instances! +Launch new instances +-------------------- -Pre-prepare the new VM ----------------------- +As simple as:: -.. note:: + $ ansible-playbook play-vm-migration-01-new-box.yml "${opts[@]}" - Backend - It's possible to run the playbook against the new copr-backend - server before we actually shut-down the old one. But to make sure that - ansible won't complain, we need +You'll see an output like:: - - A volume attached to the new box with label 'copr-repo'. Use already - existing volume named ``data-copr-be-dev-initial-playbook-run`` - - An existing complementary DNS record (``copr-be-temp`` or - ``copr-be-dev-temp``). poiting to the non-elastic IP of the new - server. See the `DNS SOP`_. + ok: [localhost] => { + "msg": [ + "ElasticIP: not specified", + "Instance ID: i-04ba36eb360187572", + "Network ID: eni-048189f432f068270", + "Unused Public IP: 100.24.62.79", + "Private IP: 172.30.2.94" + ] + } +Now fix the corresponding ``new_instance_id`` and ``new_network_id`` options in +``group_vars/{dev,prod}.yml`` according to the output. -Note the private IP addresses +Note the Private IP addresses ----------------------------- Most of the communication within Copr stack happens on public interfaces via -hostnames with one exception. Communication between ``backend`` and ``keygen`` +hostnames with one exception. Communication between ``backend`` and ``keygen`` is done on a private network behind a firewall through IP addresses that change -when spawning a fresh instance. - -.. note:: - - Backend - Whereas after updating a ``copr-backend`` (or dev) instance change - the configuration in ``inventory/group_vars/copr_keygen_aws`` or - ``inventory/group_vars/copr_keygen_dev_aws`` and update the iptables rules:: +when spawning a fresh instances. - custom_rules: [ ... ] +So once you know the Backend's private IP, please do a `private IP change`_ in +ansible.git. - -Don't start the services after first playbook run -------------------------------------------------- +Don't start the services after the first playbook run +----------------------------------------------------- Set the ``services_disabled: true`` for your instance in ``inventory/group_vars/copr_*_dev_aws`` for devel, or ``inventory/group_vars/copr_*_aws`` for production. +Pre-prepare the new VM — backend only! +-------------------------------------- -Outage window -============= - -Once you start this section, try to be time-efficient because the services are -down and unreachable by users. - - -Stop the old services ---------------------- +.. note:: -Except for the ``lighttpd.service`` on the old copr-backend (still serving -repositories to users), and ``postgresql.service`` on the old copr-frontend (we -will need it to backup the database), stop all of our services. + Running the playbook against the new copr-backend server before shutting down + the old one is possible. This minimizes the outage duration with non-working + DNF repositories on the backend, which is highly desirable. -.. warning:: - Backend - You have to terminate existing resalloc resources. - See :ref:`Terminate resalloc resources `. - -+----------------+-------------------------------------------------------------+ -| | Command | -+================+=============================================================+ -| **frontend** | ``systemctl stop httpd fm-consumer@copr_messaging.service`` | -+----------------+-------------------------------------------------------------+ -| **backend** | ``systemctl stop copr-backend.target`` | -+----------------+-------------------------------------------------------------+ -| **keygen** | ``systemctl stop httpd signd`` | -+----------------+-------------------------------------------------------------+ -| **distgit** | ``systemctl stop copr-dist-git httpd`` | -+----------------+-------------------------------------------------------------+ -| **pulp** | ``TODO`` | -+----------------+-------------------------------------------------------------+ - -Stop all timers and cron jobs so they don't collide or talk with the newly -provisioned servers:: - - systemctl stop crond - systemctl stop *timer + However, to prevent any issues with Ansible, the following prerequisites are + necessary: -.. warning:: - Backend - Do not forget to kill all ``/usr/bin/prunerepo`` and - ``/usr/bin/copr-backend-process-build`` processes:: + - A temporary volume attached to the new box that provides an ext4 filesystem + with the ``copr-repo`` label. - kill `ps -o pid,cmd -ax | grep process-build | cut -d' ' -f1` + - An existing temporary hostname (having an existing DNS record) to execute + the playbook against it. - Ideally, you should wait until - ``/usr/bin/copr-backend-process-action`` processes gets finished. + The volume, DNS record, and corresponding Elastic IP for this purpose have + already been prepared by the ``play-vm-migration-01-new-box.yml`` playbook + mentioned above. +.. note:: + The following inventory configuration should already be prepared for you in + the "commented-out" form. -Umount data volumes from old instances --------------------------------------- +Ensure that the ``copr-be-dev-temp.aws.fedoraproject.org`` is specified in the +inventory in the following groups:: -.. warning:: - Backend - Keep the backend volume mounted to the old instance. We will take - care of that later + copr_back_dev_aws + staging + cloud_aws -.. note:: - Frontend - On the new instance, it will be probably necessary to manually - upgrade the database to a new PostgreSQL version. This is our last chance to - :ref:`Backup the database ` before the upgrade. Do it. +Similarly, use ``copr-be-temp.aws.fedoraproject.org`` in:: - Once the backup is created, stop the PostgreSQL server:: + copr_back_aws + cloud_aws - systemctl stop postgresql +For both cases, set the ``birthday=yes`` variable for the temporary hostname:: + [copr_back_dev_aws] + copr-be-dev.aws.fedoraproject.org + copr-be-dev-temp.aws.fedoraproject.org birthday=yes -It might not be clear what data volumes are mounted. You can checkout -``roles/copr/*/tasks/mount_fs.yml`` in the ansible playbooks to see the data -volumes. +On Batcave, execute the playbook against the temporary hostname:: -Umount data volumes and make sure everything is written:: + $ sudo rbac-playbook -l copr-be-dev-temp.aws.fedoraproject.org groups/copr-backend.yml + $ sudo rbac-playbook -l copr-be-temp.aws.fedoraproject.org groups/copr-backend.yml - umount /the/data/directory/mount/point - sync +Once the playbook finishes successfully, remember to revert the inventory +changes we did here (commenting out again). -Perhaps you can shutdown the instance (but you don't have to):: +Outage window +============= - shutdown -h now +When initiating this section, aim for time efficiency as the services will be +down and inaccessible to users. +Let users know +-------------- -Attach data volumes to the new instances ----------------------------------------- +See :ref:`announcing_fedora_copr_outage` again, ad "ongoning" issue. -.. warning:: - Backend - Keep the backend volume attached to the old instance. We will take - care of that later - -Open Amazon AWS web UI, select ``Volumes`` in the left panel, filter them with -``CoprPurpose: infrastructure`` and ``CoprInstance`` either ``devel`` or -``production``. Find the correct volume, select it, and ``Detach Volume``. - -+----------------+-------------------------+------------------------------+ -| | Dev | Production | -+================+=========================+==============================+ -| **frontend** | data-copr-fe-dev | data-copr-frontend-prod | -+----------------+-------------------------+------------------------------+ -| **backend** | data-copr-be-dev | data-copr-backend-prod | -+----------------+-------------------------+------------------------------+ -| **keygen** | data-copr-keygen-dev | data-copr-keygen-prod | -+----------------+-------------------------+------------------------------+ -| **distgit** | data-copr-distgit-dev | data-copr-distgit-prod | -+----------------+-------------------------+------------------------------+ -| **pulp** | data-copr-pulp-dev | TODO | -+----------------+-------------------------+------------------------------+ - -Once it is done, right-click the volume again, and click to ``Attach Volume`` -(it can be safely attached to a running instance). - - -Flip the elastic IPs --------------------- +Move IPs and Volumes to the New Instances +----------------------------------------- .. warning:: - Backend - Keep the backend elastic IP associated to the old instance. We will - take care of that later + Prepare to follow the instructions provided during the playbook run. You'll + need to perform manual steps such as DB backups, consistency checks, etc. -Except for copr-be, flip the Elastic IPs to the new instances. This is needed -to allow successful run of playbooks. +Migrate the data volumes and IP addresses to the new machine. For the Backend +case, a separate playbook is created. This playbook makes the +`results directory `_ +unavailable temporarily, affecting every Copr consumer! Ensure that that the +``lighttpd`` service is running on the new server once the playbook finishes, +and that it hosts the correct results:: -Open Amazon AWS, in the left panel under ``Network & Security`` click to -``Elastic IPs``. Filter them by either ``CoprInstance : devel`` or -``CoprInstance : production``. Select the IP for your instance, and click -``Actions``, ``Associate Elastic IP address`` (don't care that it is already -associated to the old instance). + $ ansible-playbook play-vm-migration-02-migrate-backend-box.yml "${opts[@]}" -- In the ``Instance`` field, search for your instance with ``-new`` suffix -- Check-in the ``Check Allow this Elastic IP address to be reassociated`` option +For the rest of the systems (Frontend, DistGit, Keygen), use:: + $ ansible-playbook play-vm-migration-02-migrate-non-backend-box.yml "${opts[@]}" -Provision new instance from scratch ------------------------------------ +Provision the new instances +--------------------------- -In the fedora-infra ansible repository, edit ``inventory/inventory`` -file and set ``birthday=yes`` variable for your host, e.g.:: +In the fedora-infra ansible repository, edit the ``inventory/inventory`` file +and set the ``birthday=yes`` variable for your updated host, for example:: [copr_front_dev_aws] copr.stg.fedoraproject.org birthday=yes -On batcave01 run playbook to provision the instance (ignore the playbook for -upgrading Copr packages). - -.. note:: - Backend - You need to **slightly modify the calls** to use `-l - copr-be*-temp...`. - - To make the playbook work with the new `copr-be*-temp` DNS record, we have to - specify the host name on **TWO PLACES** in inventory inside ansible.git:: +This is necessary to instruct the first playbook run on ``batcave01`` to sign +the new host certificates (avoiding later manipulation with ``known_hosts``). - inventory/inventory -- copr_back_aws vs. copr_back_dev_aws groups - inventory/cloud -- cloud_aws - - If we don't, when the playbook is run, this breaks the nagios monitoring - miserably. - -For the dev instance, see +On ``batcave01``, execute the playbook to provision the instance (ignore the +playbook for upgrading Copr packages). For the dev instance, refer to https://docs.pagure.org/copr.copr/how_to_release_copr.html#upgrade-dev-machines -and for production, see +and for production, refer to https://docs.pagure.org/copr.copr/how_to_release_copr.html#upgrade-production-machines -It is possible that the playbook fails, it isn't important now. If the -provisioning gets at least thgourh the ``base`` role, revert the commit to -remove the ``birthday`` variable. - - -Dealing with backend --------------------- - -This is a backend-specific section. For other instaces, skip it completely. - -.. note:: - Backend - On the new `copr-be*-temp` hostname, stop the lighttpd - etc. and umount the temporary volume. It needs to be detached in - AWS cli, too. - -.. warning:: - Backend - You should **hurry up** and go through this section quickly. The - storage will be down and end-users will see failed `dnf update ...` - processes in terminals. - -.. note:: - Backend - Connect to the old instance via SSH. It doesn't have a hostname - anymore, so you will need to use its public IP address. - - Stop all services using the data volume, e.g.:: - - systemctl stop lighttpd - - Safely ummount the data volume - - See `Umount data volumes from old instances`_ - -.. note:: - Backend - Open Amazon AWS, detach the data volume from the old backend - instance, and a attach it to the new one. - - See `Attach data volumes to the new instances`_ - -.. note:: - Backend - Open Amazon AWS and finally flip the backend elastic IP address - from the old instance to the new one. - - See `Flip the elastic IPs`_ - -.. note:: - Backend - Re-run the playbook again, this time with the correct hostname - (without ``-temp``) and drop the ``birthday=yes`` parameter. - +It's possible that the playbook fails, but it typically isn't crucial now. If +provisioning at least reaches the end of the ``base`` role, revert the +``birthday=yes`` commit and proceed with the next steps. Get it working -------------- -Re-run the playbook from previous section again, with dropped configuration:: +Rerun the playbook from the previous section again, with dropped configuration:: services_disabled: false -It's encouraged to start with backend so the repositories are UP again. Since -we have fully working DNS and elastic IPs, even copr-backend playbook can be run -with normal `-l` argument. - -It should get past mounting but it will most likely **not** succeed. At this -point, you need to debug and fix the issues from running it. If required, adjust -the playbook and re-run it again and again (pay attention to start lighttpd -serving the repositories ASAP). +It should proceed with mounting data volumes but will likely not succeed. Now, +you'll need to debug and address the issues. If necessary, modify and rerun the +playbook multiple times (ensuring ``lighttpd`` running on the new backend all +the time). .. note:: - Frontend - It will most likely be necessary to manualy upgrade the PostgreSQL - database once you migrated to the new Fedora (new PG major version). - See how to :ref:`Upgrade the database `. - - -.. note:: - Keygen - If you upgraded keygen before backend, you need to re-run keygen - playbook once more to allow the new backend private IP address in the - iptables. - - -Update IPv6 addresses ---------------------- - -Update the ``aws_ipv6_addr`` for your instance in -``inventory/group_vars/copr_*_dev_aws`` for devel, or -``inventory/group_vars/copr_*_aws`` for production. - -Then run the playbooks once more with ``-t ipv6_config`` and reboot the -instance (or figure out a better way to get them working). - - -Fix IPv6 DNS records --------------------- - -There is no support for Elastic IPs for IPv6, so we have to update AAAA records -every time we spawn a new infrastructure machine. SSH to batcave, and setup the -DNS records there according to the `DNS SOP`_. - + Frontend - You'll likely need to manually upgrade the PostgreSQL database + once you migrate to the new Fedora (new PG major version). Refer to + :ref:`Upgrade the database `. Post-upgrade ============ -At this moment, every Copr service should be up and running. +By this point, every Copr service should be operational. +Rename the instance names +------------------------- -Drop suffix from instances names --------------------------------- +Remove the ``-new`` name suffix from the new instances and add a ``-old`` suffix +to the old instances. This playbook should be executed only once for all the +infra instances:: -Open Amazon AWS web UI, select ``Instances`` in the left panel, and filter -them with ``CoprPurpose: infrastructure``. Rename all instances -without ``-new`` suffix to end with ``-old`` suffix. Then drop -``-new`` suffix from the instances that have it. - - -.. _`terminate_os_vms`: + $ opts=( -e copr_instance=dev ) # or prod + $ ansible-playbook play-vm-migration-03-rename-instances.yml "${opts[@]}" Terminate the old instances --------------------------- -Once you don't need the old VMs, you can terminate them e.g. in Amazon web -UI. You can do it right after the upgrade or wait a couple of days to be sure. - -The instances should be protected against accidental termination, and therefore -you need to click ``Actions``, go to ``Instance settings``, -``Change termination protection``, and disable this option. +Once you no longer require the old VMs, you can terminate them using the Amazon +web UI. You can do this immediately after the upgrade or wait a couple of days +(e.g. to keep the DB ``/backups`` for a while just in case of any problems). +The old VMs are protected against accidental termination. To disable this +option, click ``Actions``, navigate to ``Instance settings`` and then to +``Change termination protection``. Final steps ----------- -Don't forget to announce on `fedora devel`_ and `copr devel`_ mailing lists and also on -``#fedora-buildsys`` that everything should be working again. - -Close the infrastructure ticket, the upgrade is done. - - +See a specific document :ref:`announcing_fedora_copr_outage`, the "resolved" +section. .. _`Fedora Infra OpenStack`: https://fedorainfracloud.org .. _`OpenStack images dashboard`: https://fedorainfracloud.org/dashboard/project/images/ @@ -514,7 +296,10 @@ Close the infrastructure ticket, the upgrade is done. .. _`Fedora infrastructure issue #7966`: https://pagure.io/fedora-infrastructure/issue/7966 .. _`fedora devel`: https://lists.fedorahosted.org/archives/list/devel@lists.fedoraproject.org/ .. _`copr devel`: https://lists.fedoraproject.org/archives/list/copr-devel@lists.fedorahosted.org/ -.. _`Amazon AWS`: https://id.fedoraproject.org/saml2/SSO/Redirect?SPIdentifier=urn:amazon:webservices&RelayState=https://console.aws.amazon.com -.. _`Cloud Base Images`: https://alt.fedoraproject.org/cloud/ +.. _`Amazon AWS account`: https://id.fedoraproject.org/saml2/SSO/Redirect?SPIdentifier=urn:amazon:webservices&RelayState=https://console.aws.amazon.com +.. _`Cloud Base Images`: https://fedoraproject.org/cloud/download/ .. _`DNS SOP`: https://docs.fedoraproject.org/en-US/infra/sysadmin_guide/dns/ .. _`ec2instances.info`: https://ec2instances.info/ +.. _`helper playbook repository`: https://github.com/fedora-copr/ansible-fedora-copr +.. _`playbook SOP`: https://docs.fedoraproject.org/en-US/infra/sysadmin_guide/ansible/ +.. _`private IP change`: https://pagure.io/fedora-infra/ansible/c/6c80a870ff2a62e73da98f7607574e534369fb37 diff --git a/doc/how_to_upgrade_persistent_instances_openstack.rst b/doc/how_to_upgrade_persistent_instances_openstack.rst deleted file mode 100644 index f3dbbf1f7..000000000 --- a/doc/how_to_upgrade_persistent_instances_openstack.rst +++ /dev/null @@ -1,251 +0,0 @@ -.. _how_to_upgrade_persistent_instances_openstack: - -How to upgrade persistent instances (OpenStack) -=============================================== - -.. warning:: - This document is specific to OpenStack and is outdated. For Amazon - AWS, see :ref:`this up-to-date one `. - -This article describes how to upgrade persistent instances (e.g. copr-fe-dev) to new Fedora version. - - -Requirements ------------- - -* an account on `Fedora Infra OpenStack`_ -* access to persistent tenant -* ssh access to batcave01 - - -Find source image ------------------ - -For OpenStack, there is an image registry on `OpenStack images dashboard`_. By -default you see only the project images; to see all of them, click on the -``Public`` button. - -Search for the ``Fedora-Cloud-Base-*`` images of the particular Fedora. Please note -that if there is a timestamp in the image name suffix than it is a beta version. -It is better to use images with numbered minor version. - -The goal in this step is just to find an image name. - - -Update the image in playbooks ------------------------------ - -Once the new image name is known, make sure it is set in `vars/global.yml`, e.g.:: - - fedora30_x86_64: Fedora-Cloud-Base-30-1.2.x86_64 - -Then edit the host vars for the instance:: - - vim inventory/host_vars/.fedorainfracloud.org - # e.g. - vim inventory/host_vars/copr-dist-git-dev.fedorainfracloud.org - -And configure it to use the new image:: - - image: "{{ fedora30_x86_64 }}" - -That is all, that needs to be changed in the ansible repository. Commit and push it. - - -Backup the old instance ------------------------ - -This part is done via ``openstack`` client on your computer. First, download an RC -file for the ``persistent`` tenant. Open `Fedora Infra OpenStack`_ dashboard, switch -to the ``Access & Security`` section, then ``API Access`` and click on -``Download OpenStack RC File``. - -Load the openstack settings:: - - source ~/Downloads/persistent-openrc.sh - -Backup the old instance by renaming it:: - - openstack server set --name _backup "" - # e.g. - openstack server set --name copr-dist-git-dev_backup "85260b5b-7f61-4398-8d05-xxxxxxxxxxxx" - - -.. warning:: backend - You have to terminate existing resalloc resources. - See `Terminate resalloc resources`_. - -.. warning:: backend - `Terminate OpenStack VMs`_. - -Finally, shut down the instance to avoid storage inconsistency and other possible problems:: - - $ ssh root@.fedorainfracloud.org - [root@copr-dist-git-dev ~][STG]# shutdown -h now - -Once the instance is halted, detach volume from the old instance:: - - openstack server remove volume "" "" - # e.g. - openstack server remove volume "52d97d72-5915-45c0-b223-xxxxxxxxxxxx" "9e2b4c55-9ec3-4508-af46-xxxxxxxxxxxx" - - -Provision new instance from scratch ------------------------------------ - -On batcave01 run playbook to provision the instance. For dev, see - -https://docs.pagure.org/copr.copr/how_to_release_copr.html#upgrade-dev-machines - -and for production, see - -https://docs.pagure.org/copr.copr/how_to_release_copr.html#upgrade-production-machines - -.. note:: Please note that the playbook may be stuck longer than expected while waiting for a new - instance to boot. See `Initial boot hangs waiting for entropy`_. - - -Get it working --------------- - -The playbook from the previous section will most likely **not** succeed. At this point, -you need to debug and fix the issues from running it. If required, adjust the playbook -and re-run it again and again. Most likely you will also need to attach a volume to it -in the `OpenStack instances dashboard`_. - -.. note:: frontend - It will most likely be necessary to manualy upgrade the database. - See `Upgrade the database`_. - -.. note:: backend - Copr backend requires an outdated version of python3-novaclient. - See `Downgrade python novaclient`_. - - -Terminate the old instance --------------------------- - -Once the new instance is successfully provisioned and working as expected, terminate the -old backup instance. - -Open the `OpenStack instances dashboard`_ and switch the current project to ``persistent`` -and find the instance, that you want to terminate. Make sure, it is the right one! Don't -mistake e.g. production instance with dev. Then look at the ``Actions`` column and click -``More`` button. In the dropdown menu, there is a button ``Terminate instance``, use it. - - -Final steps ------------ - -Don't forget to announce on `fedora devel`_ and `copr devel`_ mailing lists and also on -``#fedora-buildsys`` that everything should be working again. - -Close the infrastructure ticket. - - -Troubleshooting ---------------- - -Initial boot hangs waiting for entropy -...................................... - -Because of a known infrastructure issue `Fedora infrastructure issue #7966`_ initial boot -of an instance in OpenStack hangs and waits for entropy. It seems that it can't be fixed -properly, so we need to work around by going to `OpenStack instances dashboard`_, opening -the instance details, switching to the ``Console`` tab and typing random characters in it. -It resumes the booting process. - - -Private IP addresses -.................... - -Most of the communication within Copr stack happens on public interfaces via hostnames -with one exception. Communication between ``backend`` and ``keygen`` is done on a private -network behind a firewall through IP addresses that change when spawning a fresh instance. - -After updating a ``copr-keygen`` (or dev) instance, change its IP address in -``inventory/group_vars/copr_dev``:: - - keygen_host: "172.XX.XX.XX" - -Whereas after updating a ``copr-backend`` (or dev) instance change the configuration in -``inventory/group_vars/copr_keygen`` (or dev) and update the iptables rules:: - - custom_rules: [ ... ] - -Please note two addresses needs to be updated, both are backend's. - -Run provision playbooks for ``copr-backend`` and ``copr-keygen`` to propagate the changes -to the respective instances. - - -Terminate resalloc resources -............................ - -It is easier to close all resalloc tickets otherwise there will be dangling VMs -preventing the backend from starting new ones. - -Edit the ``/etc/resallocserver/pools.yaml`` file and in all section, set:: - - max: 0 - -Then delete all current resources:: - - su - resalloc - resalloc-maint resource-delete --all - - -Terminate OpenStack VMs -....................... - -Make sure you terminate all the OpenStack located builders allocated by -``copr-backend.service``:: - - # systemctl stop copr-backend # ensure that new are not allocated anymore - # su - copr - - # drop the builders from DB - $ redis-cli --scan --pattern 'copr:backend:vm_instance:hset::Copr_builder_*' | xargs redis-cli del - - # shutdown all the VMs which are not in DB - $ cleanup_vm_nova.py - - -Downgrade python novaclient -........................... - -Backend is dependent on ``python3-novaclient`` in prehistoric version ``3.3.1``. This -version is no longer supported and the spec file needed to be customized to build and -install only python3 package. Also, the epoch has been bumped so it doesn't get replaced -with a newer version. Please install this package from Copr project (even on production -instance):: - - dnf copr enable @copr/novaclient - dnf install python3-novaclient-2:3.3.1 - -.. note:: Please do not automatize this step in the playbook, so it forces us to deal - with the situation properly. - - -Upgrade the database -.................... - -When upgrading to a distribution that provides a new major version of PostgreSQL server, -there is a manual intervention required. - -Upgrade the database:: - - [root@copr-fe-dev ~][STG]# dnf install postgresql-upgrade - [root@copr-fe-dev ~][STG]# postgresql-setup --upgrade - - -And rebuild indexes:: - - [root@copr-fe-dev ~][STG]# su postgres - bash-5.0$ cd - bash-5.0$ reindexdb --all - - - -.. _`Fedora Infra OpenStack`: https://fedorainfracloud.org -.. _`OpenStack images dashboard`: https://fedorainfracloud.org/dashboard/project/images/ -.. _`OpenStack instances dashboard`: https://fedorainfracloud.org/dashboard/project/instances/ -.. _`Fedora infrastructure issue #7966`: https://pagure.io/fedora-infrastructure/issue/7966 -.. _`fedora devel`: https://lists.fedorahosted.org/archives/list/devel@lists.fedoraproject.org/ -.. _`copr devel`: https://lists.fedoraproject.org/archives/list/copr-devel@lists.fedorahosted.org/ diff --git a/doc/maintenance/announce_outage.rst b/doc/maintenance/announce_outage.rst new file mode 100644 index 000000000..c99e8e525 --- /dev/null +++ b/doc/maintenance/announce_outage.rst @@ -0,0 +1,78 @@ +.. _announcing_fedora_copr_outage: + +Fedora Copr outage announcements +================================ + +This document is primarily intended for planning outages due to future +infrastructure updates. However, in the event of any incidents or accidents +such as networking issues, IBM Cloud problems, Fedora Rawhide repository issues, +or any other matters that affect users, it's advisable to refer to this document +(possibly jump directly to the "Ongoing State" section). + +.. warning:: + + Schedule an outage even if it needs to occur within the next 5 minutes! + +Please familiarize yourself with the `Fedora Outage SOP`_. But in general, +follow the steps outlined in this document. + +Planned outage +-------------- + +1. Prepare the infrastructure ticket similar to `this old one `_. + +2. Send email to `copr-devel`_ mailing list informing about an upcomming + release. We usually copy-paste text of the infrastructure ticket created in a + previous step. Don't forget to put a link to the ticket at the end of the + email. See the `example `_. + +3. Adjust the `Matrix channel`_ title so it contains a message similar to:: + + Planned outage 2022-08-17 20:00 UTC - https://pagure.io/fedora-infrastructure/issue/10854 + +4. Create a new "planned" `Fedora Status SOP`_ entry. + +6. Create warning banner on Copr homepage:: + + copr-frontend warning-banner --outage_time "2022-12-31 13:00-16:00 UTC" --ticket 1234 + + +Ongoing outage +-------------- + +When the outage begins to cause real effects + +1. Change the "planned" `Fedora Status SOP`_ entry into an "ongoing" entry. + +2. Announce on `Matrix channel`_ — change title like + ``s/Planned outage ../OUTAGE NOW .../`` and send some message like + ``WARNING: The scheduled outage just begings!``. + + +Resolved outage +--------------- + +1. Remove the "Outage" note from the `Matrix channel`_ title, and send a message + that the outage is over! + +2. Send email to `copr-devel`_ mailing list. If there is some important change + you can send email to fedora devel mailing list too. Mention the link to the + "Highlights from XXXX-XX-XX release" documentation page. + +3. Propose a new "highlights" post for the `Fedora Copr Blog`_, + see `the example + `_. + +4. Close the Fedora Infra ticket. + +5. Change the "ongoing" `Fedora Status SOP`_ entry into a "resolved" one. + +6. Remove the warning banner from frontend page using + ``copr-frontend warning-banner --remove`` + + +.. _`copr-devel`: https://lists.fedoraproject.org/archives/list/copr-devel@lists.fedorahosted.org/ +.. _`Fedora Outage SOP`: https://docs.fedoraproject.org/en-US/infra/sysadmin_guide/outage/ +.. _`Fedora Status SOP`: https://docs.fedoraproject.org/en-US/infra/sysadmin_guide/status-fedora/ +.. _`Fedora Copr Blog`: https://fedora-copr.github.io/ +.. _`Matrix channel`: https://matrix.to/#/#buildsys:fedoraproject.org diff --git a/doc/maintenance_documentation.rst b/doc/maintenance_documentation.rst index 4562e6893..796ef4f1d 100644 --- a/doc/maintenance_documentation.rst +++ b/doc/maintenance_documentation.rst @@ -15,6 +15,7 @@ This section contains information about maintenance topics. You may also be inte How to manage active chroots How to rename chroots Fedora Copr hypervisors + Fedora Copr outage announcements .. toctree:: @@ -54,11 +55,3 @@ This section contains information about maintenance topics. You may also be inte email_templates Sending notifications and removing data from outdated chroots Monitoring the service - - -.. toctree:: - :caption: Obsolete - :maxdepth: 1 - - how_to_upgrade_persistent_instances_openstack - how_to_upgrade_builders_openstack diff --git a/doc/raid_on_backend.rst b/doc/raid_on_backend.rst index 030c2102c..6ead7d913 100644 --- a/doc/raid_on_backend.rst +++ b/doc/raid_on_backend.rst @@ -54,7 +54,7 @@ Adding more space 1. Create two ``gp3`` volumes in EC2 of the same size and type, tag them with ``FedoraGroup: copr``, ``CoprInstance: production``, ``CoprPurpose: infrastructure``. Attach them to a freshly started temporary instance (we - don't want to overload I/O with the `initial RAID sync `_ on + don't want to overload I/O with the `initial RAID sync `_ on production backend). Make sure the instance type has enough EBS throughput to perform the initial sync quickly enough. @@ -68,7 +68,7 @@ Adding more space $ mdadm --create --name=raid-be-03 --verbose /dev/mdXYZ --level=1 --raid-devices=2 /dev/nvmeXn1p1 /dev/nvmeYn1p1 - Wait till the new empty `array is synchronized `_ (may take hours + Wait till the new empty `array is synchronized `_ (may take hours or days, note we sync 2x16T). Check the details with ``mdadm -Db /dev/md/raid-be-03``. See the tips bellow how to make the sync speed unlimited with ``sysctl``.