diff --git a/config.toml b/config.toml index 4a1bd8902..3f1377064 100644 --- a/config.toml +++ b/config.toml @@ -14,7 +14,7 @@ style = "paraiso-dark" [params] description = "Cloud native distributed block storage for Kubernetes" tagline = "Easy to use, 100% open source, run anywhere" -versions = ["1.4.2", "1.4.1", "1.4.0", "1.3.3", "1.3.2", "1.3.1", "1.3.0", "1.2.6", "1.1.3", "1.6.0-dev", "1.5.0-dev", "1.4.3-dev", "1.3.4-dev"] +versions = ["1.5.0", "1.4.2", "1.4.1", "1.4.0", "1.3.3", "1.3.2", "1.3.1", "1.3.0", "1.2.6", "1.1.3", "1.6.0-dev", "1.5.1-dev", "1.4.3-dev", "1.3.4-dev"] archived_versions = ["1.2.5", "1.2.4", "1.2.3", "1.2.2", "1.2.1", "1.2.0", "1.1.2", "1.1.1", "1.1.0", "0.8.0", "0.8.1"] alpine_js_version = "2.2.5" locale = "en_US" diff --git a/content/docs/1.5.1/_index.md b/content/docs/1.5.1/_index.md new file mode 100644 index 000000000..413b571f7 --- /dev/null +++ b/content/docs/1.5.1/_index.md @@ -0,0 +1,18 @@ +--- +title: The Longhorn Documentation +description: Cloud native distributed block storage for Kubernetes +weight: 1 +--- + +**Longhorn** is a lightweight, reliable, and powerful distributed [block storage](https://cloudacademy.com/blog/object-storage-block-storage/) system for Kubernetes. + +Longhorn implements distributed block storage using containers and microservices. Longhorn creates a dedicated storage controller for each block device volume and synchronously replicates the volume across multiple replicas stored on multiple nodes. The storage controller and replicas are themselves orchestrated using Kubernetes. + +## Features + +* Enterprise-grade distributed block storage with no single point of failure +* Incremental snapshot of block storage +* Backup to secondary storage ([NFS](https://www.extrahop.com/resources/protocols/nfs/) or [S3](https://aws.amazon.com/s3/)-compatible object storage) built on efficient change block detection +* Recurring snapshots and backups +* Automated, non-disruptive upgrades. You can upgrade the entire Longhorn software stack without disrupting running storage volumes. +* An intuitive GUI dashboard \ No newline at end of file diff --git a/content/docs/1.5.1/advanced-resources/_index.md b/content/docs/1.5.1/advanced-resources/_index.md new file mode 100644 index 000000000..52d689fff --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/_index.md @@ -0,0 +1,4 @@ +--- +title: Advanced Resources +weight: 70 +--- \ No newline at end of file diff --git a/content/docs/1.5.1/advanced-resources/backing-image.md b/content/docs/1.5.1/advanced-resources/backing-image.md new file mode 100644 index 000000000..2b4f43724 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/backing-image.md @@ -0,0 +1,163 @@ +--- +title: Backing Image +weight: 4 +--- + +Longhorn natively supports backing images since v1.1.1. + +A QCOW2 or RAW image can be set as the backing/base image of a Longhorn volume, which allows Longhorn to be integrated with a VM like [Harvester](https://github.com/rancher/harvester). + +## Create Backing Image + +### Parameters during creation + +#### The data source of a backing image +There are 3 kinds of data sources. Or in other words, there are 3 ways to prepare a backing image file data: +1. Download a file from a URL as a backing image. +2. Upload a local file as a backing image. This is almost exclusive for UI. +3. Export an existing in-cluster volume as a backing image. + +#### The checksum of a backing image +- The checksum of a backing image is **the SHA512 checksum** of the whole backing image **file** rather than that of the actual content. + What's the difference? When Longhorn calculates the checksum of a qcow2 file, it will read the file as a raw file instead of using the qcow library to read the correct content. In other words, users always get the correct checksum by executing `shasum -a 512 ` regardless of the file format. +- It's recommended to provide the expected checksum during backing image creation. + Otherwise, Longhorn will consider the checksum of the first file as the correct one. Once there is something wrong with the first file preparation, which then leads to an incorrect checksum as the expected value, this backing image is probably unavailable. + +### The way of creating a backing image + +#### Create a backing image via Longhorn UI +On **Setting > Backing Image** page, users can create backing images with any kinds of data source. + +#### Create a backing image via YAML +You can download a file or export an existing volume as a backing image via YAML. +It's better not to "upload" a file via YAML. Otherwise, you need to manually handle the data upload via HTTP requests. + +Here are some examples: +```yaml +apiVersion: longhorn.io/v1beta2 +kind: BackingImage +metadata: + name: bi-download + namespace: longhorn-system +spec: + sourceType: download + sourceParameters: + url: https://longhorn-backing-image.s3-us-west-1.amazonaws.com/parrot.raw + checksum: 304f3ed30ca6878e9056ee6f1b02b328239f0d0c2c1272840998212f9734b196371560b3b939037e4f4c2884ce457c2cbc9f0621f4f5d1ca983983c8cdf8cd9a +``` +```yaml +apiVersion: longhorn.io/v1beta2 +kind: BackingImage +metadata: + name: bi-export + namespace: longhorn-system +spec: + sourceType: export-from-volume + sourceParameters: + volume-name: vol-export-src + export-type: qcow2 +``` + +#### Create and use a backing image via StorageClass and PVC +1. In a Longhorn StorageClass. +2. Setting parameter `backingImageName` means asking Longhorn to use this backing image during volume creation. +3. If you want to create the backing image as long as it does not exist during the CSI volume creation, parameters `backingImageDataSourceType` and `backingImageDataSourceParameters` should be set as well. Similar to YAML, it's better not to create a backing image via "upload" in StorageClass. Note that if all of these parameters are set and the backing image already exists, Longhorn will validate if the parameters matches the existing one before using it. + - For `download`: + ```yaml + kind: StorageClass + apiVersion: storage.k8s.io/v1 + metadata: + name: longhorn-backing-image-example + provisioner: driver.longhorn.io + allowVolumeExpansion: true + reclaimPolicy: Delete + volumeBindingMode: Immediate + parameters: + numberOfReplicas: "3" + staleReplicaTimeout: "2880" + backingImage: "bi-download" + backingImageDataSourceType: "download" + backingImageDataSourceParameters: '{"url": "https://backing-image-example.s3-region.amazonaws.com/test-backing-image"}' + backingImageChecksum: "SHA512 checksum of the backing image" + ``` + - For `export-from-volume`: + ```yaml + kind: StorageClass + apiVersion: storage.k8s.io/v1 + metadata: + name: longhorn-backing-image-example + provisioner: driver.longhorn.io + allowVolumeExpansion: true + reclaimPolicy: Delete + volumeBindingMode: Immediate + parameters: + numberOfReplicas: "3" + staleReplicaTimeout: "2880" + backingImage: "bi-export-from-volume" + backingImageDataSourceType: "export-from-volume" + backingImageDataSourceParameters: '{"volume-name": "vol-export-src", "export-type": "qcow2"}' + ``` + +4. Create a PVC with the StorageClass. Then the backing image will be created (with the Longhorn volume) if it does not exist. +5. Longhorn starts to prepare the backing images to disks for the replicas when a volume using the backing image is attached to a node. + +#### Notice: +- Please be careful of the escape character `\` when you input a download URL in a StorageClass. + +## Utilize a backing image in a volume + +Users can [directly create then immediately use a backing image via StorageClass](./#create-and-use-a-backing-image-via-storageclass-and-pvc), +or utilize an existing backing image as mentioned below. + +#### Use an existing backing +##### Use an existing backing Image during volume creation +1. Click **Setting > Backing Image** in the Longhorn UI. +2. Click **Create Backing Image** to create a backing image with a unique name and a valid URL. +3. During the volume creation, specify the backing image from the backing image list. +4. Longhorn starts to download the backing image to disks for the replicas when a volume using the backing image is attached to a node. + +##### Use an existing backing Image during volume restore +1. Click `Backup` and pick up a backup volume for the restore. +2. As long as the backing image is already set for the backup volume, Longhorn will automatically choose the backing image during the restore. +3. Longhorn allows you to re-specify/override the backing image during the restore. + +#### Download the backing image file to the local machine +Since v1.3.0, users can download existing backing image files to the local via UI. + +#### Notice: +- Users need to make sure the backing image existence when they use UI to create or restore a volume with a backing image specified. +- Before downloading an existing backing image file to the local, users need to guarantee there is a ready file for it. + +## Clean up backing images + +#### Clean up backing images in disks +- Longhorn automatically cleans up the unused backing image files in the disks based on [the setting `Backing Image Cleanup Wait Interval`](../../references/settings#backing-image-cleanup-wait-interval). But Longhorn will retain at least one file in a disk for each backing image anyway. +- The unused backing images can be also cleaned up manually via the Longhorn UI: Click **Setting > Backing Image > Operation list of one backing image > Clean Up**. Then choose disks. +- Once there is one replica in a disk using a backing image, no matter what the replica's current state is, the backing image file in this disk cannot be cleaned up. + +#### Delete backing images +- The backing image can be deleted only when there is no volume using it. + +## Backing image recovery +- If there is still a ready backing image file in one disk, Longhorn will automatically clean up the failed backing image files then re-launch these files from the ready one. +- If somehow all files of a backing image become failed, and the first file is : + - downloaded from a URL, Longhorn will restart the downloading. + - exported from an existing volume, Longhorn will (attach the volume if necessary then) restart the export. + - uploaded from user local env, there is no way to recover it. Users need to delete this backing image then re-create a new one by re-uploading the file. +- When a node is down or the backing image manager pod on the node is unavailable, all backing image files on the node will become `unknown`. Later on if the node is back and the pod is running, Longhorn will detect then reuse the existing files automatically. + +## Backing image Workflow +1. To manage all backing image files in a disk, Longhorn will create one backing image manager pod for each disk. Once the disk has no backing image file requirement, the backing image manager will be removed automatically. +2. Once a backing image file is prepared by the backing image manager for a disk, the file will be shared among all volume replicas in this disk. +3. When a backing image is created, Longhorn will launch a backing image data source pod to prepare the first file. The file data is from the data source users specified (download from remote/upload from local/export from the volume). After the preparation done, the backing image manager pod in the same disk will take over the file then Longhorn will stop the backing image data source pod. +4. Once a new backing image is used by a volume, the backing image manager pods in the disks that the volume replicas reside on will be asked to sync the file from the backing image manager pods that already contain the file. +5. As mentioned in the section [#clean-up-backing-images-in-disks](#clean-up-backing-images-in-disks), the file will be cleaned up automatically if all replicas in one disk do not use one backing image file. + +## Warning +- The download URL of the backing image should be public. We will improve this part in the future. +- If there is high memory usage of one backing image manager pod after [file download](#download-the-backing-image-file-to-the-local-machine), this is caused by the system cache/buffer. The memory usage will decrease automatically hence you don't need to worry about it. See [the GitHub ticket](https://github.com/longhorn/longhorn/issues/4055) for more details. + +## History +* Available since v1.1.1 [Enable backing image feature in Longhorn](https://github.com/Longhorn/Longhorn/issues/2006) +* Support [upload]((https://github.com/longhorn/longhorn/issues/2404) and [volume exporting](https://github.com/longhorn/longhorn/issues/2403) since v1.2.0. +* Support [download to local]((https://github.com/longhorn/longhorn/issues/2404) and [volume exporting](https://github.com/longhorn/longhorn/issues/3155) since v1.3.0. diff --git a/content/docs/1.5.1/advanced-resources/data-recovery/_index.md b/content/docs/1.5.1/advanced-resources/data-recovery/_index.md new file mode 100644 index 000000000..e610f95e7 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/data-recovery/_index.md @@ -0,0 +1,4 @@ +--- +title: Data Recovery +weight: 4 +--- diff --git a/content/docs/1.5.1/advanced-resources/data-recovery/corrupted-replica.md b/content/docs/1.5.1/advanced-resources/data-recovery/corrupted-replica.md new file mode 100644 index 000000000..ee0fa8993 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/data-recovery/corrupted-replica.md @@ -0,0 +1,33 @@ +--- +title: Identifying Corrupted Replicas +weight: 3 +--- + +In the case that one of the disks used by Longhorn went bad, you might experience intermittent input/output errors when using a Longhorn volume. + +For example, one file sometimes cannot be read, but later it can. In this scenario, it's likely one of the disks went bad, resulting in one of the replicas returning incorrect data to the user. + +To recover the volume, we can identify the corrupted replica and remove it from the volume: + +1. Scale down the workload to detach the volume. +2. Find all the replicas' locations by checking the Longhorn UI. The directories used by the replicas will be shown as a tooltip for each replica in the UI. +3. Log in to each node that contains a replica of the volume and get to the directory that contains the replica data. + + For example, the replica might be stored at: + + /var/lib/longhorn/replicas/pvc-06b4a8a8-b51d-42c6-a8cc-d8c8d6bc65bc-d890efb2 +4. Run a checksum for every file under that directory. + + For example: + + ``` + # sha512sum /var/lib/longhorn/replicas/pvc-06b4a8a8-b51d-42c6-a8cc-d8c8d6bc65bc-d890efb2/* + fcd1b3bb677f63f58a61adcff8df82d0d69b669b36105fc4f39b0baf9aa46ba17bd47a7595336295ef807769a12583d06a8efb6562c093574be7d14ea4d6e5f4 /var/lib/longhorn/replicas/pvc-06b4a8a8-b51d-42c6-a8cc-d8c8d6bc65bc-d890efb2/revision.counter + c53649bf4ad843dd339d9667b912f51e0a0bb14953ccdc9431f41d46c85301dff4a021a50a0bf431a931a43b16ede5b71057ccadad6cf37a54b2537e696f4780 /var/lib/longhorn/replicas/pvc-06b4a8a8-b51d-42c6-a8cc-d8c8d6bc65bc-d890efb2/volume-head-000.img + f6cd5e486c88cb66c143913149d55f23e6179701f1b896a1526717402b976ed2ea68fc969caeb120845f016275e0a9a5b319950ae5449837e578665e2ffa82d0 /var/lib/longhorn/replicas/pvc-06b4a8a8-b51d-42c6-a8cc-d8c8d6bc65bc-d890efb2/volume-head-000.img.meta + e6f6e97a14214aca809a842d42e4319f4623adb8f164f7836e07dc8a3f4816a0389b67c45f7b0d9f833d50a731ae6c4670ba1956833f1feb974d2d12421b03f7 /var/lib/longhorn/replicas/pvc-06b4a8a8-b51d-42c6-a8cc-d8c8d6bc65bc-d890efb2/volume.meta + ``` + +5. Compare the output of each replica. One of them should fail or have different results compared to the others. This will be the one replica we need to remove from the volume. +6. Use the Longhorn UI to remove the identified replica from the volume. +7. Scale up the workload to make sure the error is gone. diff --git a/content/docs/1.5.1/advanced-resources/data-recovery/data-error.md b/content/docs/1.5.1/advanced-resources/data-recovery/data-error.md new file mode 100644 index 000000000..8ddc31eba --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/data-recovery/data-error.md @@ -0,0 +1,38 @@ +--- +title: Identifying and Recovering from Data Errors +weight: 1 +--- + +If you've encountered an error message like the following: + + 'fsck' found errors on device /dev/longhorn/pvc-6288f5ea-5eea-4524-a84f-afa14b85780d but could not correct them. + +Then you have a data corruption situation. This section describes how to address the issue. + +## Bad Underlying Disk + +To determine if the error is caused because one of the underlying disks went bad, follow [these steps](../corrupted-replica) to identify corrupted replicas. + +If most of the replicas on the disk went bad, that means the disk is unreliable now and should be replaced. + +If only one replica on the disk went bad, it can be a situation known as `bit rot`. In this case, removing the replica is good enough. + +## Recover from a Snapshot + +If all the replicas are identical, then the volume needs to be recovered using snapshots. + +The reason for this is probably that the bad bit was written from the workload the volume attached to. + +To revert to a previous snapshot: + +1. In maintenance mode, attach the volume to any node. +2. Revert to a snapshot. You should start with the latest one. +3. Detach the volume from maintenance mode to any node. +4. Re-attach the volume to a node you have access to. +5. Mount the volume from `/dev/longhorn/` and check the volume content. +6. If the volume content is still incorrect, repeat from step 1. +7. Once you find a usable snapshot, make a new snapshot from there and start using the volume as normal. + +## Recover from Backup + +If all of the methods above failed, use a backup to [recover the volume.](../../../snapshots-and-backups/backup-and-restore/restore-from-a-backup) \ No newline at end of file diff --git a/content/docs/1.5.1/advanced-resources/data-recovery/export-from-replica.md b/content/docs/1.5.1/advanced-resources/data-recovery/export-from-replica.md new file mode 100644 index 000000000..a6786300d --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/data-recovery/export-from-replica.md @@ -0,0 +1,75 @@ +--- +title: Exporting a Volume from a Single Replica +weight: 2 +--- + +Each replica of a Longhorn volume contains the full data for the volume. + +If the whole Kubernetes cluster or Longhorn system goes offline, the following steps can be used to retrieve the data of the volume. + +1. Identify the volume. + + Longhorn uses the disks on the node to store the replica data. + + By default, the data is stored at the directory specified by the setting [`Default Data Path`](https://longhorn.io/docs/0.8.1/references/settings/#default-data-path). + + More disks can be added to a node by either using the Longhorn UI or by using [a node label and annotation](../../default-disk-and-node-config/). + + You can either keep a copy of the path of those disks, or use the following command to find the disks that have been used by Longhorn. For example: + + ``` + # find / -name longhorn-disk.cfg + /var/lib/longhorn/longhorn-disk.cfg + ``` + + The result above shows that the path `/var/lib/longhorn` has been used by Longhorn to store data. + +2. Check the path found in step 1 to see if it contains the data. + + The data will be stored in the `/replicas` directory, for example: + + ``` + # ls /var/lib/longhorn/replicas/ + pvc-06b4a8a8-b51d-42c6-a8cc-d8c8d6bc65bc-d890efb2 + pvc-71a266e0-5db5-44e5-a2a3-e5471b007cc9-fe160a2c + ``` + + The directory naming pattern is: + + ``` + -<8 bytes UUID> + ``` + + So in the example above, there are two volumes stored here, which are `pvc-06b4a8a8-b51d-42c6-a8cc-d8c8d6bc65bc` and `pvc-71a266e0-5db5-44e5-a2a3-e5471b007cc9`. + + The volume name matches the Kubernetes PV name. + +3. Use the `lsof` command to make sure no one is currently using the volume, e.g. + ``` + # lsof pvc-06b4a8a8-b51d-42c6-a8cc-d8c8d6bc65bc-d890efb2/ + COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME + longhorn 14464 root cwd DIR 8,0 4096 541456 pvc-06b4a8a8-b51d-42c6-a8cc-d8c8d6bc65bc-d890efb2 + ``` + The above result shows that the data directory is still being used, so don't proceed to the next step. If it's not being used, `lsof` command should return empty result. +4. Check the volume size of the volume you want to restore using the following command inside the directory: + ``` + # cat pvc-06b4a8a8-b51d-42c6-a8cc-d8c8d6bc65bc-d890efb2/volume.meta + {"Size":1073741824,"Head":"volume-head-000.img","Dirty":true,"Rebuilding":false,"Parent":"","SectorSize":512,"BackingFileName":""} + ``` + From the result above, you can see the volume size is `1073741824` (1 GiB). Note the size. +5. To export the content of the volume, use the following command to create a single replica Longhorn volume container: + + ``` + docker run -v /dev:/host/dev -v /proc:/host/proc -v :/volume --privileged longhornio/longhorn-engine:v{{< current-version >}} launch-simple-longhorn + ``` + + For example, based on the information above, the command should be: + + ``` + docker run -v /dev:/host/dev -v /proc:/host/proc -v /var/lib/longhorn/replicas/pvc-06b4a8a8-b51d-42c6-a8cc-d8c8d6bc65bc-d890efb2:/volume --privileged longhornio/longhorn-engine:v{{< current-version >}} launch-simple-longhorn pvc-06b4a8a8-b51d-42c6-a8cc-d8c8d6bc65bc 1073741824 + ``` +**Result:** Now you should have a block device created on `/dev/longhorn/` for this device, such as `/dev/longhorn/pvc-06b4a8a8-b51d-42c6-a8cc-d8c8d6bc65bc` for the example above. Now you can mount the block device to get the access to the data. + +> To avoid accidental change of the volume content, it's recommended to use `mount -o ro` to mount the directory as `readonly`. + +After you are done accessing the volume content, use `docker stop` to stop the container. The block device should disappear from the `/dev/longhorn/` directory after the container is stopped. \ No newline at end of file diff --git a/content/docs/1.5.1/advanced-resources/data-recovery/full-disk.md b/content/docs/1.5.1/advanced-resources/data-recovery/full-disk.md new file mode 100644 index 000000000..affb2cd39 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/data-recovery/full-disk.md @@ -0,0 +1,21 @@ +--- +title: Recovering from a Full Disk +weight: 4 +--- + +If one disk used by one of the Longhorn replicas is full, that replica will go to the error state, and Longhorn will rebuild another replica on another node/disk. + +To recover from a full disk, + +1. Disable the scheduling for the full disk. + + Longhorn should have already marked the disk as `unschedulable`. + + This step is to make sure the disk will not be scheduled to by accident after more space is freed up. + +2. Identify the replicas in the error state on the disk using the Longhorn UI's disk page. +3. Remove the replicas in the error state. + +## Recommended after Recovery + +We recommend adding more disks or more space to the node that had this situation. \ No newline at end of file diff --git a/content/docs/1.5.1/advanced-resources/data-recovery/recover-without-system.md b/content/docs/1.5.1/advanced-resources/data-recovery/recover-without-system.md new file mode 100644 index 000000000..88c916eb2 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/data-recovery/recover-without-system.md @@ -0,0 +1,59 @@ +--- +title: Recovering from a Longhorn Backup without System Installed +weight: 5 +--- + +This command gives users the ability to restore a backup to a `raw` image or a `qcow2` image. If the backup is based on a backing file, users should provide the backing file as a `qcow2` image with `--backing file` parameter. + +1. Copy the [yaml template](https://github.com/longhorn/longhorn/blob/v{{< current-version >}}/examples/restore_to_file.yaml.template): Make a copy of `examples/restore_to_file.yaml.template` as e.g. `restore.yaml`. + +2. Set the node which the output file should be placed on by replacing ``, e.g. `node1`. + +3. Specify the host path of output file by modifying field `hostpath` of volume `disk-directory`. By default the directory is `/tmp/restore/`. + +4. Set the first argument (backup url) by replacing ``, e.g. `s3://@/backupstore?backup=&volume=`. + + - `` and `` can be retrieved from backup.cfg stored in the backup destination folder, e.g. `backup_backup-72bcbdad913546cf.cfg`. The content will be like below: + + ```json + {"Name":"backup-72bcbdad913546cf","VolumeName":"volume_1","SnapshotName":"79758033-a670-4724-906f-41921f53c475"} + ``` + +5. Set argument `output-file` by replacing ``, e.g. `volume.raw` or `volume.qcow2`. + +6. Set argument `output-format` by replacing ``. The supported options are `raw` or `qcow2`. + +7. Set argument `longhorn-version` by replacing ``, e.g. `v{{< current-version >}}` + +8. Set the S3 Credential Secret by replacing ``, e.g. `minio-secret`. + + - The credential secret can be referenced [here](https://longhorn.io/docs/{{< current-version >}}/snapshots-and-backups/backup-and-restore/set-backup-target/#set-up-aws-s3-backupstore) and must be created in the `longhorn-system' namespace. + +9. Execute the yaml using e.g.: + + kubectl create -f restore.yaml + +10. Watch the result using: + + kubectl -n longhorn-system get pod restore-to-file -w + +After the pod status changed to `Completed`, you should able to find `` at e.g. `/tmp/restore` on the ``. + +We also provide a script, [restore-backup-to-file.sh](https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/scripts/restore-backup-to-file.sh), to restore a backup. The following parameters should be specified: + - `--backup-url`: Specifies the backups S3/NFS URL. e.g., `s3://backupbucket@us-east-1/backupstore?backup=backup-bd326da2c4414b02&volume=volumeexamplename"` + + - `--output-file`: Set the output file name. e.g, `volume.raw` + + - `--output-format`: Set the output file format. e.g. `raw` or `qcow2` + + - `--version`: Specifies the version of Longhorn to use. e.g., `v{{< current-version >}}` + +Optional parameters can be specified: + + - `--aws-access-key`: Specifies AWS credentials access key if backups is s3. + + - `--aws-secret-access-key`: Specifies AWS credentials access secret key if backups is s3. + + - `--backing-file`: backing image. e.g., `/tmp/backingfile.qcow2` + +The output image files can be found in the `/tmp/restore` folder after the script has finished running. \ No newline at end of file diff --git a/content/docs/1.5.1/advanced-resources/default-disk-and-node-config.md b/content/docs/1.5.1/advanced-resources/default-disk-and-node-config.md new file mode 100644 index 000000000..830c260b4 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/default-disk-and-node-config.md @@ -0,0 +1,107 @@ +--- +title: Configuring Defaults for Nodes and Disks +weight: 6 +--- + +_Available as of v0.8.1_ + +This feature allows the user to customize the default disks and node configurations in Longhorn for newly added nodes using Kubernetes labels and annotations instead of the Longhorn API or UI. + +Customizing the default configurations for disks and nodes is useful for scaling the cluster because it eliminates the need to configure Longhorn manually for each new node if the node contains more than one disk, or if the disk configuration is different for new nodes. + +Longhorn will not keep the node labels or annotations in sync with the current Longhorn node disks or tags. Nor will Longhorn keep the node disks or tags in sync with the nodes, labels or annotations after the default disks or tags have been created. + +### Adding Node Tags to New Nodes + +When a node does not have a tag, you can use a node annotation to set the node tags, as an alternative to using the Longhorn UI or API. + +1. Scale up the Kubernetes cluster. The newly added nodes contain no node tags. +2. Add annotations to the new Kubernetes nodes that specify what the default node tags should be. The annotation format is: + + ``` + node.longhorn.io/default-node-tags: + ``` + For example: + + ``` + node.longhorn.io/default-node-tags: '["fast","storage"]' + ``` +3. Wait for Longhorn to sync the node tag automatically. + +> **Result:** If the node tag list was originally empty, Longhorn updates the node with the tag list, and you will see the tags for that node updated according to the annotation. If the node already had tags, you will see no change to the tag list. +### Customizing Default Disks for New Nodes + +Longhorn uses the **Create Default Disk on Labeled Nodes** setting to enable default disk customization. + +If the setting is disabled, Longhorn will create a default disk using `setting.default-data-path` on all new nodes. + +If the setting is enabled, Longhorn will decide to create the default disks or not, depending on the node's label value of `node.longhorn.io/create-default-disk`. + +- If the node's label value is `true`, Longhorn will create the default disk using `settings.default-data-path` on the node. If the node already has existing disks, Longhorn will not change anything. +- If the node's label value is `config`, Longhorn will check for the `node.longhorn.io/default-disks-config` annotation and create default disks according to it. If there is no annotation, or if the annotation is invalid, or the label value is invalid, Longhorn will not change anything. + +The value of the label will be in effect only when the setting is enabled. + +If the `create-default-disk` label is not set, the default disk will not be automatically created on the new nodes when the setting is enabled. + +The configuration described in the annotation only takes effect when there are no existing disks or tags on the node. + +If the label or annotation fails validation, the whole annotation is ignored. + +> **Prerequisite:** The Longhorn setting **Create Default Disk on Labeled Nodes** must be enabled. +1. Add new nodes to the Kubernetes cluster. +2. Add the label to the node. Longhorn relies on the label to decide how to customize default disks: + + ``` + node.longhorn.io/create-default-disk: 'config' + ``` + +3. Then add an annotation to the node. The annotation is used to specify the configuration of default disks. The format is: + + ``` + node.longhorn.io/default-disks-config: + ``` + + For example, the following disk configuration can be specified in the annotation: + + ``` + node.longhorn.io/default-disks-config: + '[ + { + "path":"/mnt/disk1", + "allowScheduling":true + }, + { + "name":"fast-ssd-disk", + "path":"/mnt/disk2", + "allowScheduling":false, + "storageReserved":10485760, + "tags":[ + "ssd", + "fast" + ] + } + ]' + ``` + + > **Note:** If the same name is specified for different disks, the configuration will be treated as invalid. + +4. Wait for Longhorn to create the customized default disks automatically. + +> **Result:** The disks will be updated according to the annotation. + +### Launch Longhorn with multiple disks +1. Add the label to all nodes before launching Longhorn. + + ``` + node.longhorn.io/create-default-disk: 'config' + ``` + +2. Then add the disk config annotation to all nodes: + + ``` + node.longhorn.io/default-disks-config: '[ { "path":"/var/lib/longhorn", "allowScheduling":true + }, { "name":"fast-ssd-disk", "path":"/mnt/extra", "allowScheduling":false, "storageReserved":10485760, + "tags":[ "ssd", "fast" ] }]' + ``` +3. Deploy Longhorn with `create-default-disk-labeled-nodes: true`, check [here](../deploy/customizing-default-settings) for customizing the default settings of Longhorn. diff --git a/content/docs/1.5.1/advanced-resources/deploy/_index.md b/content/docs/1.5.1/advanced-resources/deploy/_index.md new file mode 100644 index 000000000..d7eeb68a8 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/deploy/_index.md @@ -0,0 +1,4 @@ +--- +title: Deploy +weight: 2 +--- \ No newline at end of file diff --git a/content/docs/1.5.1/advanced-resources/deploy/airgap.md b/content/docs/1.5.1/advanced-resources/deploy/airgap.md new file mode 100644 index 000000000..5b2e9439f --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/deploy/airgap.md @@ -0,0 +1,350 @@ +--- +title: Air Gap Installation +weight: 2 +--- + +Longhorn can be installed in an air gapped environment by using a manifest file, a Helm chart, or the Rancher UI. + +- [Requirements](#requirements) +- [Using a Manifest File](#using-a-manifest-file) +- [Using a Helm chart](#using-a-helm-chart) +- [Using a Rancher app](#using-a-rancher-app) +- [Troubleshooting](#troubleshooting) + +## Requirements + - Deploy Longhorn Components images to your own registry. + - Deploy Kubernetes CSI driver components images to your own registry. + +#### Note: + - A full list of all needed images is in [longhorn-images.txt](https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/deploy/longhorn-images.txt). First, download the images list by running: + ```shell + wget https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/deploy/longhorn-images.txt + ``` + - We provide a script, [save-images.sh](https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/scripts/save-images.sh), to quickly pull the above `longhorn-images.txt` list. If you specify a `tar.gz` file name for flag `--images`, the script will save all images to the provided filename. In the example below, the script pulls and saves Longhorn images to the file `longhorn-images.tar.gz`. You then can copy the file to your air-gap environment. On the other hand, if you don't specify the file name, the script just pulls the list of images to your computer. + ```shell + wget https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/scripts/save-images.sh + chmod +x save-images.sh + ./save-images.sh --image-list longhorn-images.txt --images longhorn-images.tar.gz + ``` + - We provide another script, [load-images.sh](https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/scripts/load-images.sh), to push Longhorn images to your private registry. If you specify a `tar.gz` file name for flag `--images`, the script loads images from the `tar` file and pushes them. Otherwise, it will find images in your local Docker and push them. In the example below, the script loads images from the file `longhorn-images.tar.gz` and pushes them to `` + ```shell + wget https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/scripts/load-images.sh + chmod +x load-images.sh + ./load-images.sh --image-list longhorn-images.txt --images longhorn-images.tar.gz --registry + ``` + - For more options with using the scripts, see flag `--help`: + ```shell + ./save-images.sh --help + ./load-images.sh --help + ``` + +## Using a Manifest File + +1. Get Longhorn Deployment manifest file + + `wget https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/deploy/longhorn.yaml` + +2. Create Longhorn namespace + + `kubectl create namespace longhorn-system` + + +3. If private registry require authentication, Create `docker-registry` secret in `longhorn-system` namespace: + + `kubectl -n longhorn-system create secret docker-registry --docker-server= --docker-username= --docker-password=` + + * Add your secret name `SECRET_NAME` to `imagePullSecrets.name` in the following resources + * `longhorn-driver-deployer` Deployment + * `longhorn-manager` DaemonSet + * `longhorn-ui` Deployment + + Example: + ```yaml + apiVersion: apps/v1 + kind: Deployment + metadata: + labels: + app: longhorn-ui + name: longhorn-ui + namespace: longhorn-system + spec: + replicas: 1 + selector: + matchLabels: + app: longhorn-ui + template: + metadata: + labels: + app: longhorn-ui + spec: + containers: + - name: longhorn-ui + image: longhornio/longhorn-ui:v0.8.0 + ports: + - containerPort: 8000 + env: + - name: LONGHORN_MANAGER_IP + value: "http://longhorn-backend:9500" + imagePullSecrets: + - name: ## Add SECRET_NAME here + serviceAccountName: longhorn-service-account + ``` + +4. Apply the following modifications to the manifest file + + * Modify Kubernetes CSI driver components environment variables in `longhorn-driver-deployer` Deployment point to your private registry images + * CSI_ATTACHER_IMAGE + * CSI_PROVISIONER_IMAGE + * CSI_NODE_DRIVER_REGISTRAR_IMAGE + * CSI_RESIZER_IMAGE + * CSI_SNAPSHOTTER_IMAGE + + ```yaml + - name: CSI_ATTACHER_IMAGE + value: /csi-attacher: + - name: CSI_PROVISIONER_IMAGE + value: /csi-provisioner: + - name: CSI_NODE_DRIVER_REGISTRAR_IMAGE + value: /csi-node-driver-registrar: + - name: CSI_RESIZER_IMAGE + value: /csi-resizer: + - name: CSI_SNAPSHOTTER_IMAGE + value: /csi-snapshotter: + ``` + + * Modify Longhorn images to point to your private registry images + * longhornio/longhorn-manager + + `image: /longhorn-manager:` + + * longhornio/longhorn-engine + + `image: /longhorn-engine:` + + * longhornio/longhorn-instance-manager + + `image: /longhorn-instance-manager:` + + * longhornio/longhorn-share-manager + + `image: /longhorn-share-manager:` + + * longhornio/longhorn-ui + + `image: /longhorn-ui:` + + Example: + ```yaml + apiVersion: apps/v1 + kind: Deployment + metadata: + labels: + app: longhorn-ui + name: longhorn-ui + namespace: longhorn-system + spec: + replicas: 1 + selector: + matchLabels: + app: longhorn-ui + template: + metadata: + labels: + app: longhorn-ui + spec: + containers: + - name: longhorn-ui + image: /longhorn-ui: ## Add image name and tag here + ports: + - containerPort: 8000 + env: + - name: LONGHORN_MANAGER_IP + value: "http://longhorn-backend:9500" + imagePullSecrets: + - name: + serviceAccountName: longhorn-service-account + ``` + +5. Deploy Longhorn using modified manifest file + `kubectl apply -f longhorn.yaml` + +## Using a Helm Chart + +In v{{< current-version >}}, Longhorn automatically adds prefix to images. You simply need to set the registryUrl parameters to pull images from your private registry. + +> **Note:** Once you set registryUrl to your private registry, Longhorn tries to pull images from the registry exclusively. Make sure all Longhorn components' images are in the registry otherwise Longhorn will fail to pull images. + +### Use default image name + +If you keep the images' names as recommended [here](./#recommendation), you only need to do the following steps: + +1. Clone the Longhorn repo: + + `git clone https://github.com/longhorn/longhorn.git` + +2. In `chart/values.yaml` + + * Specify `Private registry URL`. If the registry requires authentication, specify `Private registry user`, `Private registry password`, and `Private registry secret`. + Longhorn will automatically generate a secret with the those information and use it to pull images from your private registry. + + ```yaml + defaultSettings: + registrySecret: + + privateRegistry: + registryUrl: + registryUser: + registryPasswd: + registrySecret: + ``` + +### Use custom image name + +If you want to use custom images' names, you can use the following steps: + +1. Clone longhorn repo + + `git clone https://github.com/longhorn/longhorn.git` + +2. In `chart/values.yaml` + + > **Note:** Do not include the private registry prefix, it will be added automatically. e.g: if your image is `example.com/username/longhorn-manager`, use `username/longhorn-manager` in the following charts. + + - Specify Longhorn images and tag: + + ```yaml + image: + longhorn: + engine: + repository: longhornio/longhorn-engine + tag: + manager: + repository: longhornio/longhorn-manager + tag: + ui: + repository: longhornio/longhorn-ui + tag: + instanceManager: + repository: longhornio/longhorn-instance-manager + tag: + shareManager: + repository: longhornio/longhorn-share-manager + tag: + ``` + + - Specify CSI Driver components images and tag: + + ```yaml + csi: + attacher: + repository: longhornio/csi-attacher + tag: + provisioner: + repository: longhornio/csi-provisioner + tag: + nodeDriverRegistrar: + repository: longhornio/csi-node-driver-registrar + tag: + resizer: + repository: longhornio/csi-resizer + tag: + snapshotter: + repository: longhornio/csi-snapshotter + tag: + ``` + + - Specify `Private registry URL`. If the registry requires authentication, specify `Private registry user`, `Private registry password`, and `Private registry secret`. + Longhorn will automatically generate a secret with the those information and use it to pull images from your private registry. + + ```yaml + defaultSettings: + registrySecret: + + privateRegistry: + registryUrl: + registryUser: + registryPasswd: + ``` + +3. Install Longhorn + * **Helm2** + + `helm install ./chart --name longhorn --namespace longhorn-system` + + * **Helm3** + + `kubectl create namespace longhorn-system` + + `helm install longhorn ./chart --namespace longhorn-system` + +# Using a Rancher App + +### Use default image name + +If you keep the images' names as recommended [here](./#recommendation), you only need to do the following steps: + +- In the `Private Registry Settings` section specify: + - Private registry URL + - Private registry user + - Private registry password + - Private registry secret name + + Longhorn will automatically generate a secret with the those information and use it to pull images from your private registry. + + ![images](/img/screenshots/airgap-deploy/app-default-images.png) + +### Use custom image name + +- If you want to use custom images' names, you can set `Use Default Images` to `False` and specify images' names. + + > **Note:** Do not include the private registry prefix, it will be added automatically. e.g: if your image is `example.com/username/longhorn-manager`, use `username/longhorn-manager` in the following charts. + + ![images](/img/screenshots/airgap-deploy/app-custom-images.png) + +- Specify `Private registry URL`. If the registry requires authentication, specify `Private registry user`, `Private registry password`, and `Private registry secret name`. + Longhorn will automatically generate a secret with the those information and use it to pull images from your private registry. + + ![images](/img/screenshots/airgap-deploy/app-custom-images-reg.png) + +## Troubleshooting + +#### For Helm/Rancher installation, if user forgot to submit a secret to authenticate to private registry, `longhorn-manager DaemonSet` will fail to create. + + +1. Create the Kubernetes secret + + `kubectl -n longhorn-system create secret docker-registry --docker-server= --docker-username= --docker-password=` + + +2. Create `registry-secret` setting object manually. + + ```yaml + apiVersion: longhorn.io/v1beta2 + kind: Setting + metadata: + name: registry-secret + namespace: longhorn-system + value: + ``` + + `kubectl apply -f registry-secret.yml` + + +3. Delete Longhorn and re-install it again. + + * **Helm2** + + `helm uninstall ./chart --name longhorn --namespace longhorn-system` + + `helm install ./chart --name longhorn --namespace longhorn-system` + + * **Helm3** + + `helm uninstall longhorn ./chart --namespace longhorn-system` + + `helm install longhorn ./chart --namespace longhorn-system` + +## Recommendation: +It's highly recommended not to manipulate image tags, especially instance manager image tags such as v1_20200301, because we intentionally use the date to avoid associating it with a Longhorn version. + +The images of Longhorn's components are hosted in Dockerhub under the `longhornio` account. For example, `longhornio/longhorn-manager:v{{< current-version >}}`. It's recommended to keep the account name, `longhornio`, the same when you push the images to your private registry. This helps avoid unnecessary configuration issues. diff --git a/content/docs/1.5.1/advanced-resources/deploy/customizing-default-settings.md b/content/docs/1.5.1/advanced-resources/deploy/customizing-default-settings.md new file mode 100644 index 000000000..12fc1510e --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/deploy/customizing-default-settings.md @@ -0,0 +1,211 @@ +--- +title: Customizing Default Settings +weight: 1 +--- + +You may customize Longhorn's default settings while installing or upgrading. You may specify, for example, `Create Default Disk With Node Labeled` and `Default Data Path` before starting Longhorn. + +The default settings can be customized in the following ways: + +- [Installation](#installation) + - [Using the Rancher UI](#using-the-rancher-ui) + - [Using the Longhorn Deployment YAML File](#using-the-longhorn-deployment-yaml-file) + - [Using Helm](#using-helm) +- [Update Settings](#update-settings) + - [Using the Longhorn UI](#using-the-longhorn-ui) + - [Using the Rancher UI](#using-the-rancher-ui-1) + - [Using Kubectl](#using-kubectl) + - [Using Helm](#using-helm-1) +- [Upgrade](#upgrade) + - [Using the Rancher UI](#using-the-rancher-ui-2) + - [Using the Longhorn Deployment YAML File](#using-the-longhorn-deployment-yaml-file-1) + - [Using Helm](#using-helm-2) +- [History](#history) + + +> **NOTE:** When using Longhorn Deployment YAML file or Helm for installation, updating or upgrading, if the value of a default setting is an empty string and valid, the default setting will be cleaned up in Longhorn. If not, Longhorn will ignore the invalid values and will not update the default values. + +## Installation +### Using the Rancher UI + +From the project view in Rancher, go to **Apps && Marketplace > Longhorn > Install > Next > Edit Options > Longhorn Default Settings > Customize Default Settings** and edit the settings before installing the app. + +### Using the Longhorn Deployment YAML File + +1. Download the longhorn repo: + + ```shell + git clone https://github.com/longhorn/longhorn.git + ``` + +1. Modify the config map named `longhorn-default-setting` in the yaml file `longhorn/deploy/longhorn.yaml`. + + In the below example, users customize the default settings, backup-target, backup-target-credential-secret, and default-data-path. + When the setting is absent or has a leading `#` symbol, the default setting will use the default value in Longhorn or the customized values previously configured. + + ```yaml + --- + apiVersion: v1 + kind: ConfigMap + metadata: + name: longhorn-default-setting + namespace: longhorn-system + data: + default-setting.yaml: |- + backup-target: s3://backupbucket@us-east-1/backupstore + backup-target-credential-secret: minio-secret + #allow-recurring-job-while-volume-detached: + #create-default-disk-labeled-nodes: + default-data-path: /var/lib/longhorn-example/ + #replica-soft-anti-affinity: + #replica-auto-balance: + #storage-over-provisioning-percentage: + #storage-minimal-available-percentage: + #upgrade-checker: + #default-replica-count: + #default-data-locality: + #default-longhorn-static-storage-class: + #backupstore-poll-interval: + #taint-toleration: + #system-managed-components-node-selector: + #priority-class: + #auto-salvage: + #auto-delete-pod-when-volume-detached-unexpectedly: + #disable-scheduling-on-cordoned-node: + #replica-zone-soft-anti-affinity: + #node-down-pod-deletion-policy: + #node-drain-policy: + #replica-replenishment-wait-interval: + #concurrent-replica-rebuild-per-node-limit: + #disable-revision-counter: + #system-managed-pods-image-pull-policy: + #allow-volume-creation-with-degraded-availability: + #auto-cleanup-system-generated-snapshot: + #concurrent-automatic-engine-upgrade-per-node-limit: + #backing-image-cleanup-wait-interval: + #backing-image-recovery-wait-interval: + #guaranteed-instance-manager-cpu: + #kubernetes-cluster-autoscaler-enabled: + #orphan-auto-deletion: + #storage-network: + #recurring-successful-jobs-history-limit: + #recurring-failed-jobs-history-limit: + --- + ``` + +### Using Helm + +Use the Helm command with the `--set` flag to modify the default settings. For example: + +- Helm 2 + ```shell + helm install longhorn/longhorn \ + --name longhorn \ + --namespace longhorn-system \ + --set defaultSettings.taintToleration="key1=value1:NoSchedule; key2:NoExecute" + ``` + +- Helm 3 + ```shell + helm install longhorn longhorn/longhorn \ + --namespace longhorn-system \ + --create-namespace \ + --set defaultSettings.taintToleration="key1=value1:NoSchedule; key2:NoExecute" + ``` + +You can also provide a copy of the `values.yaml` file with the default settings modified to the `--values` flag when running the Helm command: + +1. Obtain a copy of the `values.yaml` file from GitHub: + + ```shell + curl -Lo values.yaml https://raw.githubusercontent.com/longhorn/charts/master/charts/longhorn/values.yaml + ``` + +2. Modify the default settings in the YAML file. The following is an example snippet of `values.yaml`: + + When the setting is absent or has a leading `#` symbol, the default setting will use the default value in Longhorn or the customized values previously configured. + + ```yaml + defaultSettings: + backupTarget: s3://backupbucket@us-east-1/backupstore + backupTargetCredentialSecret: minio-secret + createDefaultDiskLabeledNodes: true + defaultDataPath: /var/lib/longhorn-example/ + replicaSoftAntiAffinity: false + storageOverProvisioningPercentage: 600 + storageMinimalAvailablePercentage: 15 + upgradeChecker: false + defaultReplicaCount: 2 + defaultDataLocality: disabled + defaultLonghornStaticStorageClass: longhorn-static-example + backupstorePollInterval: 500 + taintToleration: key1=value1:NoSchedule; key2:NoExecute + systemManagedComponentsNodeSelector: "label-key1:label-value1" + priorityClass: high-priority + autoSalvage: false + disableSchedulingOnCordonedNode: false + replicaZoneSoftAntiAffinity: false + volumeAttachmentRecoveryPolicy: never + nodeDownPodDeletionPolicy: do-nothing + guaranteedInstanceManagerCpu: 15 + orphanAutoDeletion: false + ``` + +3. Run Helm with `values.yaml`: + - Helm 2 + ```shell + helm install longhorn/longhorn \ + --name longhorn \ + --namespace longhorn-system \ + --values values.yaml + ``` + - Helm 3 + ```shell + helm install longhorn longhorn/longhorn \ + --namespace longhorn-system \ + --create-namespace \ + --values values.yaml + ``` + +For more info about using helm, see the section about +[installing Longhorn with Helm](../../../deploy/install/install-with-helm) + +## Update Settings + +### Using the Longhorn UI + +We recommend using the Longhorn UI to change Longhorn setting on the existing cluster. It would make the setting persistent. + +### Using the Rancher UI + +From the project view in Rancher, go to **Apps && Marketplace > Longhorn > Upgrade > Next > Edit Options > Longhorn Default Settings > Customize Default Settings** and edit the settings before upgrading the app to the current Longhorn version. + +### Using Kubectl + +If you prefer to use the command line to update the setting, you could use `kubectl`. +```shell +kubectl edit settings -n longhorn-system +``` + +### Using Helm + +Modify the default settings in the YAML file as described in [Fresh Installation > Using Helm](#using-helm) and then update the settings using +``` +helm upgrade longhorn longhorn/longhorn --namespace longhorn-system --values ./values.yaml --version `helm list -n longhorn-system -o json | jq -r .'[0].app_version'` +``` + +## Upgrade + +### Using the Rancher UI + +From the project view in Rancher, go to **Apps && Marketplace > Longhorn > Upgrade > Next > Edit Options > Longhorn Default Settings > Customize Default Settings** and edit the settings before upgrading the app. +### Using the Longhorn Deployment YAML File + +Modify the config map named `longhorn-default-setting` in the yaml file `longhorn/deploy/longhorn.yaml` as described in [Fresh Installation > Using the Longhorn Deployment YAML File](#using-the-longhorn-deployment-yaml-file) and then upgrade the Longhorn system using `kubectl`. + +### Using Helm + +Modify the default settings in the YAML file as described in [Fresh Installation > Using Helm](#using-helm) and then upgrade the Longhorn system using `helm upgrade`. + +## History +Available since v1.3.0 ([Reference](https://github.com/longhorn/longhorn/issues/2570)) diff --git a/content/docs/1.5.1/advanced-resources/deploy/node-selector.md b/content/docs/1.5.1/advanced-resources/deploy/node-selector.md new file mode 100644 index 000000000..246efde28 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/deploy/node-selector.md @@ -0,0 +1,93 @@ +--- +title: Node Selector +weight: 4 +--- + +If you want to restrict Longhorn components to only run on a particular set of nodes, you can set node selector for all Longhorn components. +For example, you want to install Longhorn in a cluster that has both Linux nodes and Windows nodes but Longhorn cannot run on Windows nodes. +In this case, you can set the node selector to restrict Longhorn to only run on Linux nodes. + +For more information about how node selector work, refer to the [official Kubernetes documentation.](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector) + +# Setting up Node Selector for Longhorn +Longhorn system contains user deployed components (e.g, Manager, Driver Deployer, UI) and system managed components (e.g, Instance Manager, Engine Image, CSI Driver, etc.) +You need to set node selector for both types of components. See more details below. + +### Setting up Node Selector During installing Longhorn +1. Set node selector for user deployed components (Manager, UI, Driver Deployer) + * If you install Longhorn by Rancher 2.5.x, you need to click `Edit as YAML` in Rancher UI and copy this values into the YAML: + ```yaml + longhornManager: + nodeSelector: + label-key1: "label-value1" + longhornDriver: + nodeSelector: + label-key1: "label-value1" + longhornUI: + nodeSelector: + label-key1: "label-value1" + ``` + * If you install Longhorn by using `kubectl` to apply [the deployment YAML](https://raw.githubusercontent.com/longhorn/longhorn/v1.1.1/deploy/longhorn.yaml), you need to modify the node selector section for Longhorn Manager, Longhorn UI, and Longhorn Driver Deployer. + Then apply the YAMl files. + * If you install Longhorn using Helm, you can change the Helm values for `longhornManager.nodeSelector`, `longhornUI.nodeSelector`, `longhornDriver.nodeSelector` in the `values.yaml` file. + Then install the chart + +2. Set node selector for system managed components + + Follow the [Customize default settings](../customizing-default-settings/) to set node selector by changing the value for the `system-managed-components-node-selector` default setting + > Note: Because of the limitation of Rancher 2.5.x, if you are using Rancher UI to install Longhorn, you need to click `Edit As Yaml` and add setting `systemManagedComponentsNodeSelector` to `defaultSettings`. + > + > For example: + > ```yaml + > defaultSettings: + > systemManagedComponentsNodeSelector: "label-key1:label-value1" + > ``` + + +### Setting up Node Selector After Longhorn has been installed + +> **Warning**: +> * Since all Longhorn components will be restarted, the Longhorn system is unavailable temporarily. +> * Make sure all Longhorn volumes are `detached`. If there are running Longhorn volumes in the system, this means the Longhorn system cannot restart its components and the request will be rejected. +> * Don't operate the Longhorn system while node selector settings are updated and Longhorn components are being restarted. + +1. Prepare + * If you are changing node selector in a way so that Longhorn cannot run on some nodes that Longhorn is currently running on, + you will lose the volume replicas on those nodes. + Therefore, It is recommended that you evict replicas and disable scheduling for those nodes before changing node selector. + See [Evicting Replicas on Disabled Disks or Nodes](../../../volumes-and-nodes/disks-or-nodes-eviction) for more details about how to do this. + * Stop all workloads and detach all Longhorn volumes. Make sure all Longhorn volumes are `detached`. + +2. Set node selector for user deployed components + * If you install Longhorn by Rancher UI, you need to click `Edit as YAML` and copy this values into the YAML then click upgrade: + ```yaml + longhornManager: + nodeSelector: + label-key1: "label-value1" + longhornDriver: + nodeSelector: + label-key1: "label-value1" + longhornUI: + nodeSelector: + label-key1: "label-value1" + ``` + * If you install Longhorn by using `kubectl` to apply [the deployment YAML](https://raw.githubusercontent.com/longhorn/longhorn/v1.1.1/deploy/longhorn.yaml), you need to modify the node selector section for Longhorn Manager, Longhorn UI, and Longhorn Driver Deployer. + Then reapply the YAMl files. + * If you install Longhorn using Helm, you can change the Helm values for `longhornManager.nodeSelector`, `longhornUI.nodeSelector`, `longhornDriverDeployer.nodeSelector` in the `value.yaml` file. + Then do Helm upgrade the chart. + +3. Set node selector for system managed components + + The node selector setting can be found at Longhorn UI under **Setting > General > System Managed Components Node Selector.** + +4. Clean up + + If you are changing node selector in a way so that Longhorn cannot run on some nodes that Longhorn is currently running on, + those nodes will become `down` state after this process. Verify that there is no replica left on those nodes. + Disable scheduling for those nodes, and delete them in Longhorn UI + +## History +Available since v1.1.1 +* [Original feature request](https://github.com/longhorn/longhorn/issues/2199) + + diff --git a/content/docs/1.5.1/advanced-resources/deploy/priority-class.md b/content/docs/1.5.1/advanced-resources/deploy/priority-class.md new file mode 100644 index 000000000..83c301a38 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/deploy/priority-class.md @@ -0,0 +1,42 @@ +--- +title: Priority Class +weight: 6 +--- +The Priority Class setting can be used to set a higher priority on Longhorn workloads in the cluster, preventing them from being the first to be evicted during node pressure situations. + +For more information on how pod priority works, refer to the [official Kubernetes documentation](https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/). + +# Setting Priority Class +Longhorn system contains user deployed components (e.g, Longhorn manager, Longhorn driver, Longhorn UI) and system managed components (e.g, instance manager, engine image, CSI driver, etc.) +You need to set Priority Class for both types of components. See more details below. + +### Setting Priority Class During Longhorn Installation +1. Set taint Priority Class for system managed components: follow the [Customize default settings](../customizing-default-settings/) to set Priority Class by changing the value for the `priority-class` default setting +1. Set taint Priority Class for user deployed components: modify the Helm chart or deployment YAML file depending on how you deploy Longhorn. + +> **Warning:** Longhorn will not start if the Priority Class setting is invalid (such as the Priority Class not existing). +> You can see if this is the case by checking the status of the longhorn-manager DaemonSet with `kubectl -n longhorn-system describe daemonset.apps/longhorn-manager`. +> You will need to uninstall Longhorn and restart the installation if this is the case. + +### Setting Priority Class After Longhorn Installation + +1. Set taint Priority Class for system managed components: The Priority Class setting can be found in the Longhorn UI by clicking **Setting > General > Priority Class.** +1. Set taint Priority Class for user deployed components: modify the Helm chart or deployment YAML file depending on how you deploy Longhorn. + +Users can update or remove the Priority Class here, but note that this will result in recreation of all the Longhorn system components. +The Priority Class setting will reject values that appear to be invalid Priority Classes. + +# Usage + +Before modifying the Priority Class setting, all Longhorn volumes must be detached. + +Since all Longhorn components will be restarted, the Longhorn system will temporarily be unavailable. If there are running Longhorn volumes in the system, Longhorn system will not be able to restart its components, and the request will be rejected. + +Don't operate the Longhorn system after modifying the Priority Class setting, as the Longhorn components will be restarting. + +Do not delete the Priority Class in use by Longhorn, as this can cause new Longhorn workloads to fail to come online. + +## History +[Original Feature Request](https://github.com/longhorn/longhorn/issues/1487) + +Available since v1.0.1 diff --git a/content/docs/1.5.1/advanced-resources/deploy/rancher_windows_cluster.md b/content/docs/1.5.1/advanced-resources/deploy/rancher_windows_cluster.md new file mode 100644 index 000000000..4dc51576f --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/deploy/rancher_windows_cluster.md @@ -0,0 +1,45 @@ +--- +title: Rancher Windows Cluster +weight: 5 +--- + +Rancher can provision a Windows cluster with combination of Linux worker nodes and Windows worker nodes. +For more information on the Rancher Windows cluster, see the official [Rancher documentation](https://rancher.com/docs/rancher/v2.x/en/cluster-provisioning/rke-clusters/windows-clusters/). + +In a Rancher Windows cluster, all Linux worker nodes are: +- Tainted with the taint `cattle.io/os=linux:NoSchedule` +- Labeled with `kubernetes.io/os:linux` + +Follow the below [Deploy Longhorn With Supported Helm Chart](#deploy-longhorn-with-supported-helm-chart) or [Setup Longhorn Components For Existing Longhorn](#setup-longhorn-components-for-existing-longhorn) to know how to deploy or setup Longhorn on a Rancher Windows cluster. + +> **Note**: After Longhorn is deployed, you can launch workloads that use Longhorn volumes only on Linux nodes. + +## Deploy Longhorn With Supported Helm Chart +You can update the Helm value `global.cattle.windowsCluster.enabled` to allow Longhorn installation on the Rancher Windows cluster. + +When this value is set to `true`, Longhorn will recognize the Rancher Windows cluster then deploy Longhorn components with the correct node selector and tolerations so that all Longhorn workloads can be launched on Linux nodes only. + +On the Rancher marketplace, the setting can be customized in `customize Helm options` before installation: \ +`Edit Options` > `Other Settings` > `Rancher Windows Cluster` + +Also in: \ +`Edit YAML` +``` +global: + cattle: + systemDefaultRegistry: "" + windowsCluster: + # Enable this to allow Longhorn to run on the Rancher deployed Windows cluster + enabled: true +``` + +## Setup Longhorn Components For Existing Longhorn +You can setup the existing Longhorn when its not deployed with the supported Helm chart. + +1. Since Longhorn components can only run on Linux nodes, + you need to set node selector `kubernetes.io/os:linux` for Longhorn to select the Linux nodes. + Please follow the instruction at [Node Selector](../node-selector) to set node selector for Longhorn. + +1. Since all Linux worker nodes in Rancher Windows cluster are tainted with the taint `cattle.io/os=linux:NoSchedule`, + You need to set the toleration `cattle.io/os=linux:NoSchedule` for Longhorn to be able to run on those nodes. + Please follow the instruction at [Taint Toleration](../taint-toleration) to set toleration for Longhorn. diff --git a/content/docs/1.5.1/advanced-resources/deploy/revision_counter.md b/content/docs/1.5.1/advanced-resources/deploy/revision_counter.md new file mode 100644 index 000000000..3a62f1b45 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/deploy/revision_counter.md @@ -0,0 +1,54 @@ +--- +title: Revision Counter +weight: 7 +--- + +The revision counter is a strong mechanism that Longhorn uses to track each replica's updates. + +During replica creation, Longhorn will create a 'revision.counter' file with its initial counter set to 0. And for every write to the replica, the counter in 'revision.counter' file will be increased by 1. + +The Longhorn Engine will use these counters to make sure all replicas are consistent during start time. These counters are also used during salvage recovery to decide which replica has the latest update. + +Disable Revision Counter is an option in which every write on replicas is not tracked. When this setting is used, performance is improved, but the strong tracking for each replica is lost. This option can be helpful if you prefer higher performance and have a stable network infrastructure (e.g. an internal network) with enough CPU resources. When the Longhorn Engine starts, it will skip checking the revision counter for all replicas, but auto-salvage will still be supported through the replica's head file stat. For details on how auto-salvage works without the revision counter, refer to [this section.](#auto-salvage-support-with-revision-counter-disabled) + +By default, the revision counter is enabled. + +> **Note:** 'Salvage' is Longhorn trying to recover a volume in a faulted state. A volume is in a faulted state when the Longhorn Engine loses the connection to all the replicas, and all replicas are marked as being in an error state. + +# Disable Revision Counter +## Using Longhorn UI +To disable or enable the revision counter from the Longhorn UI, click **Setting > General > Disable Revision Counter.** + +To create individual volumes with settings that are customized against the general settings, go to the **Volume** page and click **Create Volume.** + +## Using a Manifest File + +A `StorageClass` can be customized to add a `disableRevisionCounter` parameter. + +By default, the `disableRevisionCounter` is false, so the revision counter is enabled. + +Set `disableRevisionCounter` to true to disable the revision counter: + +```yaml +kind: StorageClass +apiVersion: storage.k8s.io/v1 +metadata: + name: best-effort-longhorn +provisioner: driver.longhorn.io +allowVolumeExpansion: true +parameters: + numberOfReplicas: "1" + disableRevisionCounter: "true" + staleReplicaTimeout: "2880" # 48 hours in minutes + fromBackup: "" +``` + +## Auto-Salvage Support with Revision Counter Disabled +The logic for auto-salvage is different when the revision counter is disabled. + +When revision counter is enabled and all the replicas in the volume are in the 'ERR' state, the engine controller will be in a faulted state, and for engine to recover the volume, it will get the replica with the largest revision counter as 'Source of Truth' to rebuild the rest replicas. + +When the revision counter is disabled in this case, the engine controller will get the `volume-head-xxx.img` last modified time and head file size of all replicas. It will also do the following steps: +1. Based on the time that `volume-head-xxx.img` was last modified, get the latest modified replica, and any replica that was last modified within five seconds can be put in the candidate replicas for now. +2. Compare the head file size for all the candidate replicas, and pick the one with the largest file size as the source of truth. +3. The replica chosen as the source of truth is changed to 'RW' mode, and the rest of the replicas are marked as 'ERR' mode. Replicas are rebuilt based on the replica chosen as the source of truth. diff --git a/content/docs/1.5.1/advanced-resources/deploy/storage-network.md b/content/docs/1.5.1/advanced-resources/deploy/storage-network.md new file mode 100644 index 000000000..b5c6eceed --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/deploy/storage-network.md @@ -0,0 +1,47 @@ +--- +title: Storage Network +weight: 8 +--- + +By Default, Longhorn uses the default Kubernetes cluster CNI network that is limited to a single interface and shared with other workloads cluster-wide. In case you have a situation where network segregation is needed, Longhorn supports isolating Longhorn in-cluster data traffic with the Storage Network setting. + +The Storage Network setting takes Multus NetworkAttachmentDefinition in `/` format. + +You can refer to [Comprehensive Document](https://github.com/k8snetworkplumbingwg/multus-cni#comprehensive-documentation) for how to install and set up Multus NetworkAttachmentDefintion. + +Applying the setting will add `k8s.v1.cni.cncf.io/networks` annotation and recreate all existing instance-manager, and backing-image-manager pods. +Longhorn will apply the same annotation to any new instance-manager, backing-image-manager, and backing-image-data-source pods. + +> **Warning**: Do not change this setting with volumes attached. +> +> Longhorn will try to block this setting update when there are attached volumes. + +# Setting Storage Network + +## Prerequisite + +The Multus NetworkAttachmentDefinition network for the storage network setting must be reachable in pods across different cluster nodes. + +You can verify by creating a simple DaemonSet and try ping between pods. + +### Setting Storage Network During Longhorn Installation +Follow the [Customize default settings](../customizing-default-settings/) to set Storage Network by changing the value for the `storage-network` default setting + +> **Warning:** Longhorn instance-manager will not start if the Storage Network setting is invalid. +> +> You can check the events of the instance-manager Pod to see if it is related to an invalid NetworkAttachmentDefintion with `kubectl -n longhorn-system describe pods -l longhorn.io/component=instance-manager`. +> +> If this is the case, provide a valid `NetworkAttachmentDefinition` and re-run Longhorn install. + +### Setting Storage Network After Longhorn Installation + +Set the setting [Storage Network](../../../references/settings#storage-network). + +> **Warning:** Do not modify the NetworkAttachmentDefinition custom resource after applying it to the setting. +> +> Longhorn is not aware of the updates. Hence this will cause malfunctioning and error. Instead, you can create a new NetworkAttachmentDefinition custom resource and update it to the setting. + +## History +[Original Feature Request](https://github.com/longhorn/longhorn/issues/2285) + +Available since v1.3.0 diff --git a/content/docs/1.5.1/advanced-resources/deploy/taint-toleration.md b/content/docs/1.5.1/advanced-resources/deploy/taint-toleration.md new file mode 100644 index 000000000..658d5fd22 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/deploy/taint-toleration.md @@ -0,0 +1,107 @@ +--- +title: Taints and Tolerations +weight: 3 +--- + +If users want to create nodes with large storage spaces and/or CPU resources for Longhorn only (to store replica data) and reject other general workloads, they can taint those nodes and add tolerations for Longhorn components. Then Longhorn can be deployed on those nodes. + +Notice that the taint tolerations setting for one workload will not prevent it from being scheduled to the nodes that don't contain the corresponding taints. + +For more information about how taints and tolerations work, refer to the [official Kubernetes documentation.](https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/) + +# Setting up Taints and Tolerations +Longhorn system contains user deployed components (e.g, Manager, Driver Deployer, UI) and system managed components (e.g, Instance Manager, Engine Image, CSI Driver, etc.) +You need to set tolerations for both types of components. See more details below. + +### Setting up Taints and Tolerations During installing Longhorn +1. Set taint tolerations for user deployed components (Manager, UI, Driver Deployer) + * If you install Longhorn by Rancher 2.5.x, you need to click `Edit as YAML` in Rancher UI and copy this values into the YAML: + ```yaml + longhornManager: + tolerations: + - key: "key" + operator: "Equal" + value: "value" + effect: "NoSchedule" + longhornDriver: + tolerations: + - key: "key" + operator: "Equal" + value: "value" + effect: "NoSchedule" + longhornUI: + tolerations: + - key: "key" + operator: "Equal" + value: "value" + effect: "NoSchedule" + ``` + * If you install Longhorn by using `kubectl` to apply [the deployment YAML](https://raw.githubusercontent.com/longhorn/longhorn/v1.1.1/deploy/longhorn.yaml), you need to modify the taint tolerations section for Longhorn Manager, Longhorn UI, and Longhorn Driver Deployer. + Then apply the YAMl files. + * If you install Longhorn using Helm, you can change the Helm values for `longhornManager.tolerations`, `longhornUI.tolerations`, `longhornDriver.tolerations` in the `values.yaml` file. + Then install the chart + +2. Set taint tolerations for system managed components + + Follow the [Customize default settings](../customizing-default-settings/) to set taint tolerations by changing the value for the `taint-toleratio` default setting + > Note: Because of the limitation of Rancher 2.5.x, if you are using Rancher UI to install Longhorn, you need to click `Edit As Yaml` and add setting `taintToleration` to `defaultSettings`. + > + > For example: + > ```yaml + > defaultSettings: + > taintToleration: "key=value:NoSchedule" + > ``` + +### Setting up Taints and Tolerations After Longhorn has been installed + +> **Warning**: +> +> Before modifying the toleration settings, users should make sure all Longhorn volumes are `detached`. +> +> Since all Longhorn components will be restarted, the Longhorn system is unavailable temporarily. +> If there are running Longhorn volumes in the system, this means the Longhorn system cannot restart its components and the request will be rejected. +> +> Don't operate the Longhorn system while toleration settings are updated and Longhorn components are being restarted. + +1. Prepare + + Stop all workloads and detach all Longhorn volumes. Make sure all Longhorn volumes are `detached`. + +2. Set taint tolerations for user deployed components (Manager, UI, Driver Deployer) + * If you install Longhorn by Rancher 2.5.x, you need to click `Edit as YAML` in Rancher UI and copy this values into the YAML: + ```yaml + longhornManager: + tolerations: + - key: "key" + operator: "Equal" + value: "value" + effect: "NoSchedule" + longhornDriver: + tolerations: + - key: "key" + operator: "Equal" + value: "value" + effect: "NoSchedule" + longhornUI: + tolerations: + - key: "key" + operator: "Equal" + value: "value" + effect: "NoSchedule" + ``` + * If you install Longhorn by using `kubectl` to apply [the deployment YAML](https://raw.githubusercontent.com/longhorn/longhorn/v1.1.1/deploy/longhorn.yaml), you need to modify the taint tolerations section for Longhorn Manager, Longhorn UI, and Longhorn Driver Deployer. + Then reapply the YAMl files. + * If you install Longhorn using Helm, you can change the Helm values for `longhornManager.tolerations`, `longhornUI.tolerations`, `longhornDriver.tolerations` in the `values.yaml` file. + Then do Helm upgrade the chart. + +3. Set taint tolerations for system managed components + + The taint toleration setting can be found at Longhorn UI under **Setting > General > Kubernetes Taint Toleration.** + + +## History +Available since v0.6.0 +* [Original feature request](https://github.com/longhorn/longhorn/issues/584) +* [Resolve the problem with GitOps](https://github.com/longhorn/longhorn/issues/2120) + + diff --git a/content/docs/1.5.1/advanced-resources/fast-replica-rebuild.md b/content/docs/1.5.1/advanced-resources/fast-replica-rebuild.md new file mode 100644 index 000000000..aa3b095e7 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/fast-replica-rebuild.md @@ -0,0 +1,23 @@ +--- +title: Fast Replica Rebuild +weight: 4 +--- + +Longhorn supports fast replica rebuilding based on the checksums of snapshot disk files. + +## Introduction + +The legacy replica rebuilding process walks through all snapshot disk files. For each data block, the client (healthy replica) hashes the local data block as well as requests the checksum of the corresponding data block on the remote side (rebuilt replica). Then, the client compares the two checksums to determine if the data block needs to be sent to the remote side and override the data block. Thus, it is an IO- and computing-intensive process, especially if the volume is large or contains a large number of snapshot files. + +If users enable the snapshot data integrity check feature by configuring `snapshot-data-integrity` to `enabled` or `fast-check`, the change timestamps and the checksums of snapshot disk files are recorded. As long as the two below conditions are met, we can skip the synchronization of the snapshot disk file. +- The change timestamps on the snapshot disk file and the value recorded are the same. +- Both the local and remote snapshot disk files have the same checksum. + +Then, a reduction in the number of unnecessary computations can speed up the entire process as well as reduce the impact on the system performance. + +## Settings +### Global Settings +- fast-replica-rebuild-enabled
+ + The setting enables fast replica rebuilding feature. It relies on the checksums of snapshot disk files, so setting the snapshot-data-integrity to **enable** or **fast-check** is a prerequisite. Please refer to [Snapshot Data Integrity Check](../snapshot-data-integrity-check). + diff --git a/content/docs/1.5.1/advanced-resources/iscsi.md b/content/docs/1.5.1/advanced-resources/iscsi.md new file mode 100644 index 000000000..820144e50 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/iscsi.md @@ -0,0 +1,27 @@ +--- +title: Use Longhorn Volume as an iSCSI Target +weight: 4 +--- + +Longhorn supports iSCSI target frontend mode. You can connect to it +through any iSCSI client, including `open-iscsi`, and virtual machine +hypervisor like KVM, as long as it's in the same network as the Longhorn system. + +The Longhorn CSI driver doesn't support iSCSI mode. + +To start a volume with the iSCSI target frontend mode, select `iSCSI` as the frontend when [creating the volume.](../../volumes-and-nodes/create-volumes) + +After the volume has been attached, you will see something like the following in the `endpoint` field: + +```text +iscsi://10.42.0.21:3260/iqn.2014-09.com.rancher:testvolume/1 +``` + +In this example, + +- The IP and port is `10.42.0.21:3260`. +- The target name is `iqn.2014-09.com.rancher:testvolume`. +- The volume name is `testvolume`. +- The LUN number is 1. Longhorn always uses LUN 1. + +The above information can be used to connect to the iSCSI target provided by Longhorn using an iSCSI client. diff --git a/content/docs/1.5.1/advanced-resources/migrating-flexvolume.md b/content/docs/1.5.1/advanced-resources/migrating-flexvolume.md new file mode 100644 index 000000000..95bfd2678 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/migrating-flexvolume.md @@ -0,0 +1,17 @@ +--- +title: Migrating from the Flexvolume Driver to CSI +weight: 5 +--- + +As of Longhorn v0.8.0, the Flexvolume driver is no longer supported. This guide will show you how to migrate from the Flexvolume driver to CSI. CSI is the newest out-of-tree Kubernetes storage interface. + +> Note that the volumes created and used through one driver won't be recognized by Kubernetes using the other driver. So please don't switch the driver (e.g. during an upgrade) if you have existing volumes created using the old driver. + +Ensure your Longhorn App is up to date. Follow the relevant upgrade procedure before proceeding. + +The migration path between drivers requires backing up and restoring each volume and will incur both API and workload downtime. This can be a tedious process. Consider deleting unimportant workloads using the old driver to reduce effort. + +1. [Back up existing volumes](../../snapshots-and-backups/backup-and-restore/create-a-backup). +2. On Rancher UI, navigate to the `Catalog Apps` screen, locate the `Longhorn` app and click the `Up to date` button. Under `Kubernetes Driver`, select +`flexvolume`. We recommend leaving `Flexvolume Path` empty. Click `Upgrade`. +3. Restore each volume. This [procedure](../../snapshots-and-backups/backup-and-restore/restore-statefulset) is tailored to the StatefulSet workload, but the process is approximately the same for all workloads. \ No newline at end of file diff --git a/content/docs/1.5.1/advanced-resources/orphaned-data-cleanup.md b/content/docs/1.5.1/advanced-resources/orphaned-data-cleanup.md new file mode 100644 index 000000000..33fa8513c --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/orphaned-data-cleanup.md @@ -0,0 +1,150 @@ +--- +title: Orphaned Data Cleanup +weight: 4 +--- + +Longhorn supports orphaned data cleanup. Currently, Longhorn can identify and clean up the orphaned replica directories on disks. + +## Orphaned Replica Directories + +When a user introduces a disk into a Longhorn node, it may contain replica directories that are not tracked by the Longhorn system. The untracked replica directories may belong to other Longhorn clusters. Or, the replica CRs associated with the replica directories are removed after the node or the disk is down. When the node or the disk comes back, the corresponding replica data directories are no longer tracked by the Longhorn system. These replica data directories are called orphaned. + +Longhorn supports the detection and cleanup of orphaned replica directories. It identifies the directories and gives a list of `orphan` resources that describe those directories. By default, Longhorn does not automatically delete `orphan` resources and their directories. Users can trigger the deletion of orphaned replica directories manually or have it done automatically. + +### Example + +In the example, we will explain how to manage orphaned replica directories identified by Longhorn via `kubectl` and Longhorn UI. + +#### Manage Orphaned Replica Directories via kubectl + +1. Introduce disks containing orphaned replica directories. + - Orphaned replica directories on Node `worker1` disks + ``` + # ls /mnt/disk/replicas/ + pvc-19c45b11-28ee-4802-bea4-c0cabfb3b94c-15a210ed + ``` + - Orphaned replica directories on Node `worker2` disks + ``` + # ls /var/lib/longhorn/replicas/ + pvc-28255b31-161f-5621-eea3-a1cbafb4a12a-866aa0a5 + + # ls /mnt/disk/replicas/ + pvc-19c45b11-28ee-4802-bea4-c0cabfb3b94c-a86771c0 + ``` + +2. Longhorn detects the orphaned replica directories and creates an `orphan` resources describing the directories. + ``` + # kubectl -n longhorn-system get orphans + NAME TYPE NODE + orphan-fed8c6c20965c7bdc3e3bbea5813fac52ccd6edcbf31e578f2d8bab93481c272 replica rancher60-worker1 + orphan-637f6c01660277b5333f9f942e4b10071d89379dbe7b4164d071f4e1861a1247 replica rancher60-worker2 + orphan-6360f22930d697c74bec4ce4056c05ac516017b908389bff53aca0657ebb3b4a replica rancher60-worker2 + ``` +3. One can list the `orphan` resources created by Longhorn system by `kubectl -n longhorn-system get orphan`. + ``` + kubectl -n longhorn-system get orphan + ``` + +4. Get the detailed information of one of the orphaned replica directories in `spec.parameters` by `kubcel -n longhorn-system get orphan `. + ``` + # kubectl -n longhorn-system get orphans orphan-fed8c6c20965c7bdc3e3bbea5813fac52ccd6edcbf31e578f2d8bab93481c272 -o yaml + apiVersion: longhorn.io/v1beta2 + kind: Orphan + metadata: + creationTimestamp: "2022-04-29T10:17:40Z" + finalizers: + - longhorn.io + generation: 1 + labels: + longhorn.io/component: orphan + longhorn.io/managed-by: longhorn-manager + longhorn.io/orphan-type: replica + longhornnode: rancher60-worker1 + + ...... + + spec: + nodeID: rancher60-worker1 + orphanType: replica + parameters: + DataName: pvc-19c45b11-28ee-4802-bea4-c0cabfb3b94c-15a210ed + DiskName: disk-1 + DiskPath: /mnt/disk/ + DiskUUID: 90f00e61-d54e-44b9-a095-35c2b56a0462 + status: + conditions: + - lastProbeTime: "" + lastTransitionTime: "2022-04-29T10:17:40Z" + message: "" + reason: "" + status: "True" + type: DataCleanable + - lastProbeTime: "" + lastTransitionTime: "2022-04-29T10:17:40Z" + message: "" + reason: "" + status: "False" + type: Error + ownerID: rancher60-worker1 + ``` + +5. One can delete the `orphan` resource by `kubectl -n longhorn-system delete orphan ` and then the corresponding orphaned replica directory will be deleted. + ``` + # kubectl -n longhorn-system delete orphan orphan-fed8c6c20965c7bdc3e3bbea5813fac52ccd6edcbf31e578f2d8bab93481c272 + + # kubectl -n longhorn-system get orphans + NAME TYPE NODE + orphan-637f6c01660277b5333f9f942e4b10071d89379dbe7b4164d071f4e1861a1247 replica rancher60-worker2 + orphan-6360f22930d697c74bec4ce4056c05ac516017b908389bff53aca0657ebb3b4a replica rancher60-worker2 + ``` + + The orphaned replica directory is deleted. + ``` + # ls /mnt/disk/replicas/ + + ``` + +6. By default, Longhorn will not automatically delete the orphaned replica directory. One can enable the automatic deletion by setting `orphan-auto-deletion` to `true`. + ``` + # kubectl -n longhorn-system edit settings.longhorn.io orphan-auto-deletion + ``` + Then, set the value to `true`. + + ``` + # kubectl -n longhorn-system get settings.longhorn.io orphan-auto-deletion + NAME VALUE AGE + orphan-auto-deletion true 26m + ``` + +7. After enabling the automatic deletion and wait for a while, the `orphan` resources and directories are deleted automatically. + ``` + # kubectl -n longhorn-system get orphans.longhorn.io + No resources found in longhorn-system namespace. + ``` + The orphaned replica directories are deleted. + ``` + # ls /mnt/disk/replicas/ + + # ls /var/lib/longhorn/replicas/ + + ``` + + Additionally, one can delete all orphaned replica directories on the specified node by + ``` + # kubectl -n longhorn-system delete orphan -l "longhornnode=” + ``` + +#### Manage Orphaned Replica Directories via Longhorn UI + +In the top navigation bar of the Longhorn UI, click `Setting > Orphaned Data`. Orphaned replica directories on each node and in each disk are listed. One can delete the directories by `Operation > Delete`. + +By default, Longhorn will not automatically delete the orphaned replica directory. One can enable the automatic deletion in `Setting > General > Orphan`. + +### Exception +Longhorn will not create an `orphan` resource for an orphaned directory when +- The orphaned directory is not an **orphaned replica directory**. + - The directory name does not follow the replica directory's naming convention. + - The volume volume.meta file is missing. +- The orphaned replica directory is on an evicted node. +- The orphaned replica directory is in an evicted disk. +- The orphaned data cleanup mechanism does not clean up a stable replica, also known as an error replica. Instead, the stale replica is cleaned up according to the [staleReplicaTimeout](../../volumes-and-nodes/create-volumes/#creating-longhorn-volumes-with-kubectl) setting. \ No newline at end of file diff --git a/content/docs/1.5.1/advanced-resources/os-distro-specific/_index.md b/content/docs/1.5.1/advanced-resources/os-distro-specific/_index.md new file mode 100644 index 000000000..0ceff6707 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/os-distro-specific/_index.md @@ -0,0 +1,5 @@ +--- +title: OS/Distro Specific Configuration +weight: 3 +--- + diff --git a/content/docs/1.5.1/advanced-resources/os-distro-specific/csi-on-gke.md b/content/docs/1.5.1/advanced-resources/os-distro-specific/csi-on-gke.md new file mode 100644 index 000000000..2d1283d24 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/os-distro-specific/csi-on-gke.md @@ -0,0 +1,16 @@ +--- +title: Longhorn CSI on GKE +weight: 3 +--- + +To operate Longhorn on a cluster provisioned with Google Kubernetes Engine, some additional configuration is required. + +1. GKE clusters must use the `Ubuntu` OS instead of `Container-Optimized` OS, in order to satisfy Longhorn's `open-iscsi` dependency. + +2. GKE requires a user to manually claim themselves as cluster admin to enable role-based access control. Before installing Longhorn, run the following command: + + ```shell + kubectl create clusterrolebinding cluster-admin-binding --clusterrole=cluster-admin --user= + ``` + + where `name@example.com` is the user's account name in GCE. It's case sensitive. See [this document](https://cloud.google.com/kubernetes-engine/docs/how-to/role-based-access-control) for more information. \ No newline at end of file diff --git a/content/docs/1.5.1/advanced-resources/os-distro-specific/csi-on-k3s.md b/content/docs/1.5.1/advanced-resources/os-distro-specific/csi-on-k3s.md new file mode 100644 index 000000000..9cba37194 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/os-distro-specific/csi-on-k3s.md @@ -0,0 +1,55 @@ +--- + title: Longhorn CSI on K3s + weight: 1 +--- + +In this section, you'll learn how to install Longhorn on a K3s Kubernetes cluster. [K3s](https://rancher.com/docs/k3s/latest/en/) is a fully compliant Kubernetes distribution that is easy to install, using half the memory, all in a binary of less than 50mb. + +## Requirements + + - Longhorn v0.7.0 or higher. + - `open-iscsi` or `iscsiadm` installed on the node. + +## Instruction + + Longhorn v0.7.0 and above support k3s v0.10.0 and above only by default. + + If you want to deploy these new Longhorn versions on versions before k3s v0.10.0, you need to set `--kubelet-root-dir` to `/agent/kubelet` for the Deployment `longhorn-driver-deployer` in `longhorn/deploy/longhorn.yaml`. + `data-dir` is a `k3s` arg and it can be set when you launch a k3s server. By default it is `/var/lib/rancher/k3s`. + +## Troubleshooting + +### Common issues + +#### Failed to get arg root-dir: Cannot get kubelet root dir, no related proc for root-dir detection ... + +This error is due to Longhorn cannot detect where is the root dir setup for Kubelet, so the CSI plugin installation failed. + +You can override the root-dir detection by setting environment variable `KUBELET_ROOT_DIR` in https://github.com/longhorn/longhorn/blob/v{{< current-version >}}/deploy/longhorn.yaml. + +#### How to find `root-dir`? + +**For K3S prior to v0.10.0** + +Run `ps aux | grep k3s` and get argument `--data-dir` or `-d` on k3s node. + +e.g. +``` +$ ps uax | grep k3s +root 4160 0.0 0.0 51420 3948 pts/0 S+ 00:55 0:00 sudo /usr/local/bin/k3s server --data-dir /opt/test/kubelet +root 4161 49.0 4.0 259204 164292 pts/0 Sl+ 00:55 0:04 /usr/local/bin/k3s server --data-dir /opt/test/kubelet +``` +You will find `data-dir` in the cmdline of proc `k3s`. By default it is not set and `/var/lib/rancher/k3s` will be used. Then joining `data-dir` with `/agent/kubelet` you will get the `root-dir`. So the default `root-dir` for K3S is `/var/lib/rancher/k3s/agent/kubelet`. + +If K3S is using a configuration file, you would need to check the configuration file to locate the `data-dir` parameter. + +**For K3S v0.10.0+** + +It is always `/var/lib/kubelet` + +## Background +#### Longhorn versions before v0.7.0 don't work on K3S v0.10.0 or above +K3S now sets its kubelet directory to `/var/lib/kubelet`. See [the K3S release comment](https://github.com/rancher/k3s/releases/tag/v0.10.0) for details. + +## Reference +https://github.com/kubernetes-csi/driver-registrar diff --git a/content/docs/1.5.1/advanced-resources/os-distro-specific/csi-on-rke-and-coreos.md b/content/docs/1.5.1/advanced-resources/os-distro-specific/csi-on-rke-and-coreos.md new file mode 100644 index 000000000..563d07f2a --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/os-distro-specific/csi-on-rke-and-coreos.md @@ -0,0 +1,87 @@ +--- + title: Longhorn CSI on RKE and CoreOS + weight: 2 +--- + +For minimalist Linux Operating systems, you'll need a little extra configuration to use Longhorn with RKE (Rancher Kubernetes Engine). This document outlines the requirements for using RKE and CoreOS. + +### Background + +CSI doesn't work with CoreOS + RKE before Longhorn v0.4.1. The reason is that in the case of CoreOS, RKE sets the argument `root-dir=/opt/rke/var/lib/kubelet` for the kubelet , which is different from the default value `/var/lib/kubelet`. + +**For k8s v1.12+**, the kubelet will detect the `csi.sock` according to argument `<--kubelet-registration-path>` passed in by Kubernetes CSI driver-registrar, and `-reg.sock` (for Longhorn, it's `io.rancher.longhorn-reg.sock`) on kubelet path `/plugins`. + + **For k8s v1.11,** the kubelet will find both sockets on kubelet path `/var/lib/kubelet/plugins`. + +By default, Longhorn CSI driver creates and expose these two sock files on the host path `/var/lib/kubelet/plugins`. Then the kubelet cannot find `-reg.sock`, so CSI driver doesn't work. + +Furthermore, the kubelet will instruct the CSI plugin to mount the Longhorn volume on `/pods//volumes/kubernetes.io~csi//mount`. But this path inside the CSI plugin container won't be bind mounted on the host path. And the mount operation for the Longhorn volume is meaningless. + +Therefore, in this case, Kubernetes cannot connect to Longhorn using the CSI driver without additional configuration. + +### Requirements + + - Kubernetes v1.11 or higher. + - Longhorn v0.4.1 or higher. + +### 1. Add extra binds for the kubelet + +> This step is only required for For CoreOS + and Kubernetes v1.11. It is not needed for Kubernetes v1.12+. + +Add extra_binds for kubelet in RKE `cluster.yml`: + +``` + +services: + kubelet: + extra_binds: + - "/opt/rke/var/lib/kubelet/plugins:/var/lib/kubelet/plugins" + +``` + +This makes sure the kubelet plugins directory is exposed for CSI driver installation. + +### 2. Start the iSCSI Daemon + +If you want to enable iSCSI daemon automatically at boot, you need to enable the systemd service: + +``` +sudo su +systemctl enable iscsid +reboot +``` + +Or just start the iSCSI daemon for the current session: + +``` +sudo su +systemctl start iscsid +``` + +### Troubleshooting + +#### Failed to get arg root-dir: Cannot get kubelet root dir, no related proc for root-dir detection ... + +This error happens because Longhorn cannot detect the root dir setup for the kubelet, so the CSI plugin installation failed. + +You can override the root-dir detection by setting environment variable `KUBELET_ROOT_DIR` in https://github.com/longhorn/longhorn/blob/v{{< current-version >}}/deploy/longhorn.yaml. + +#### How to find `root-dir`? + +Run `ps aux | grep kubelet` and get the argument `--root-dir` on host node. + +For example, +``` + +$ ps aux | grep kubelet +root 3755 4.4 2.9 744404 120020 ? Ssl 00:45 0:02 kubelet --root-dir=/opt/rke/var/lib/kubelet --volume-plugin-dir=/var/lib/kubelet/volumeplugins + +``` +You will find `root-dir` in the cmdline of proc `kubelet`. If it's not set, the default value `/var/lib/kubelet` would be used. In the case of CoreOS, the root-dir would be `/opt/rke/var/lib/kubelet` as shown above. + +If the kubelet is using a configuration file, you need to check the configuration file to locate the `root-dir` parameter. + +### References +https://github.com/kubernetes-csi/driver-registrar + +https://coreos.com/os/docs/latest/iscsi.html diff --git a/content/docs/1.5.1/advanced-resources/rancher-cluster-restore.md b/content/docs/1.5.1/advanced-resources/rancher-cluster-restore.md new file mode 100644 index 000000000..551e4b348 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/rancher-cluster-restore.md @@ -0,0 +1,54 @@ +--- +title: Restore cluster with a Rancher snapshot +weight: 4 +--- + +This doc describes what users need to do after restoring the cluster with a Rancher snapshot. + +## Assumptions: +- Most of the data and the underlying disks still exist in the cluster before the restore and can be directly reused then. +- There is a backupstore holding all volume data. +- The setting [`Disable Revision Counter`](../../references/settings/#disable-revision-counter) is false. (It's false by default.) Otherwise, users need to manually check if the data among volume replicas are consistent, or directly restore volumes from backup. + +## Expectation: +- All settings and node & disk configs will be restored. +- As long as the valid data still exists, the volumes can be recovered without using a backup. In other words, we will try to avoid restoring backups, which may help reduce Recovery Time Objective (RTO) as well as save bandwidth. +- Detect the invalid or out-of-sync replicas as long as the related volume still contains a valid replica after the restore. + +## Behaviors & Requirement of Rancher restore +- According to [the Rancher restore article](https://rancher.com/blog/2018/2018-05-30-recover-rancher-kubernetes-cluster-from-backup/), you have to restart the Kubernetes components on all nodes. Otherwise, there will be tons of resource update conflicts in Longhorn. + +## Actions after the restore +- Restart all Kubernetes components for all nodes. See the above link for more details. + +- Kill all longhorn manager pods then Kubernetes will automatically restart them. Wait for conflicts in longhorn manager pods to disappear. + +- All volumes may be reattached. If a Longhorn volume is used by a single pod, users need to shut down then recreate it. For Deployments or Statefulsets, Longhorn will automatically kill then restart the related pods. + +- If the following happens after the snapshot and before the cluster restore: + - A volume is unchanged: Users don't need to do anything. + - The data is updated: Users don't need to do anything typically. Longhorn will automatically fail the replicas that don't contain the latest data. + - A new volume is created: This volume will disappear after the restore. Users need to recreate a new volume, launch [a single replica volume](../data-recovery/export-from-replica) based on the replica of the disappeared volume, then transfer the data to the new volume. + - A volume is deleted: Since the data is cleaned up when the volume is removed, the restored volume contains no data. Users may need to re-delete it. + - For DR volumes: Users don't need to do anything. Longhorn will redo a full restore. + - Some operations are applied for a volume: + - Backup: The backup info of the volume should be resynced automatically. + - Snapshot: The snapshot info of the volume should be resynced once the volume is attached. + - Replica rebuilding & replica removal: + - If there are new replicas rebuilt, those replicas will disappear from the Longhorn system after the restoring. Users need to clean up the replica data manually, or use the data directories of these replicas to export a single replica volume then do data recovery if necessary. + - If there are some failed/removed replicas and there is at least one replica keeping healthy, those failed/removed replicas will be back after the restoration. Then Longhorn can detect these restored replicas do not contain any data, and copy the latest data from the healthy replica to these replicas. + - If all replicas are replaced by new replicas after the snapshot, the volume will contain invalid replicas only after the restore. Then users need to export [a single replica volume](../data-recovery/export-from-replica) for the data recovery. + - Engine image upgrade: Users need to redo the upgrade. + - Expansion: The spec size of the volume will be smaller than the current size. This is like someone requesting volume shrinking but actually Longhorn will refuse to handle it internally. To recover the volume, users need to scale down the workloads and re-do the expansion. + + - **Notice**: If users don't know how to recover a problematic volume, the simplest way is always restoring a new volume from backup. + +- If the Longhorn system is upgraded after the snapshot, the new settings and the modifications on the node config will disappear. Users need to re-do the upgrade, then re-modify the settings and node configurations. + +- If a node is deleted from Longhorn system after the snapshot, the node won't be back, but the pods on the removed node will be restored. Users need to manually clean up them since these pod may get stuck in state `Terminating`. +- If a node to added to Longhorn system after the snapshot, Longhorn should automatically relaunch all necessary workloads on the node after the cluster restore. But users should be aware that all new replicas or engines on this node will be gone after the restore. + + +## References +- The related GitHub issue is https://github.com/longhorn/longhorn/issues/2228. + In this GitHub post, one user is providing a way that restores the Longhorn to a new cluster that doesn't contain any data. diff --git a/content/docs/1.5.1/advanced-resources/rwx-workloads.md b/content/docs/1.5.1/advanced-resources/rwx-workloads.md new file mode 100644 index 000000000..6ed422775 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/rwx-workloads.md @@ -0,0 +1,112 @@ +--- +title: ReadWriteMany (RWX) Volume +weight: 4 +--- + +Longhorn supports ReadWriteMany (RWX) volumes by exposing regular Longhorn volumes via NFSv4 servers that reside in share-manager pods. + + +# Introduction + +For each actively in use RWX volume Longhorn will create a `share-manager-` Pod in the `longhorn-system` namespace. This Pod is responsible for exporting a Longhorn volume via a NFSv4 server that is running inside the Pod. There is also a service created for each RWX volume, and that is used as an endpoint for the actual NFSv4 client connection. + +{{< figure src="/img/diagrams/rwx/rwx-arch.png" >}} + +# Requirements + +It is necessary to meet the following requirements in order to use RWX volumes. + +1. Each NFS client node needs to have a NFSv4 client installed. + + Please refer to [Installing NFSv4 client](../../deploy/install/#installing-nfsv4-client) for more installation details. + + > **Troubleshooting:** If the NFSv4 client is not available on the node, when trying to mount the volume the below message will be part of the error: + > ``` + > for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount. helper program. + > ``` + +2. The hostname of each node is unique in the Kubernetes cluster. + + There is a dedicated recovery backend service for NFS servers in Longhorn system. When a client connects to an NFS server, the client's information, including its hostname, will be stored in the recovery backend. When a share-manager Pod or NFS server is abnormally terminated, Longhorn will create a new one. Within the 90-seconds grace period, clients will reclaim locks using the client information stored in the recovery backend. + + > **Tip:** The [environment check script](https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/scripts/environment_check.sh) helps users to check all nodes have unique hostnames. + +# Creation and Usage of a RWX Volume + +1. For dynamically provisioned Longhorn volumes, the access mode is based on the PVC's access mode. +2. For manually created Longhorn volumes (restore, DR volume) the access mode can be specified during creation in the Longhorn UI. +3. When creating a PV/PVC for a Longhorn volume via the UI, the access mode of the PV/PVC will be based on the volume's access mode. +4. One can change the Longhorn volume's access mode via the UI as long as the volume is not bound to a PVC. +5. For a Longhorn volume that gets used by a RWX PVC, the volume access mode will be changed to RWX. + +# Failure Handling + +1. share-manager Pod is abnormally terminated + + Client IO will be blocked until Longhorn creates a new share-manager Pod and the associated volume. Once the Pod is successfully created, the 90-seconds grace period for lock reclamation is started, and users would expect + - Before the grace period ends, client IO to the RWX volume will still be blocked. + - The server rejects READ and WRITE operations and non-reclaim locking requests with an error of NFS4ERR_GRACE. + - The grace period can be terminated early if all locks are successfully reclaimed. + + After exiting the grace period, IOs of the clients successfully reclaiming the locks continue without stale file handle errors or IO errors. If a lock cannot be reclaimed within the grace period, the lock is discarded, and the server returns IO error to the client. The client re-establishes a new lock. The application should handle the IO error. Nevertheless, not all applications can handle IO errors due to their implementation. Thus, it may result in the failure of the IO operation and the data loss. Data consistency may be an issue. + + Here is an example of a DaemonSet using a RWX volume. + + Each Pod of the DaemonSet is writing data to the RWX volume. If the node, where the share-manager Pod is running, is down, a new share-manager Pod is created on another node. Since one of the clients located on the down node has gone, the lock reclaim process cannot be terminated earlier than 90-second grace period, even though the remaining clients' locks have been successfully reclaimed. The IOs of these clients continue after the grace period has expired. + +2. If the Kubernetes DNS service goes down, share-manager Pods will not be able to communicate with longhorn-nfs-recovery-backend + + The NFS-ganesha server in a share-manager Pod communicates with longhorn-nfs-recovery-backend via the service `longhorn-recovery-backend`'s IP. If the DNS service is out of service, the creation and deletion of RWX volumes as well as the recovery of NFS servers will be inoperable. Thus, the high availability of the DNS service is recommended for avoiding the communication failure. + +# Migration from Previous External Provisioner + +The below PVC creates a Kubernetes job that can copy data from one volume to another. + +- Replace the `data-source-pvc` with the name of the previous NFSv4 RWX PVC that was created by Kubernetes. +- Replace the `data-target-pvc` with the name of the new RWX PVC that you wish to use for your new workloads. + +You can manually create a new RWX Longhorn volume + PVC/PV, or just create a RWX PVC and then have Longhorn dynamically provision a volume for you. + +Both PVCs need to exist in the same namespace. If you were using a different namespace than the default, change the job's namespace below. + +```yaml +apiVersion: batch/v1 +kind: Job +metadata: + namespace: default # namespace where the PVC's exist + name: volume-migration +spec: + completions: 1 + parallelism: 1 + backoffLimit: 3 + template: + metadata: + name: volume-migration + labels: + name: volume-migration + spec: + restartPolicy: Never + containers: + - name: volume-migration + image: ubuntu:xenial + tty: true + command: [ "/bin/sh" ] + args: [ "-c", "cp -r -v /mnt/old /mnt/new" ] + volumeMounts: + - name: old-vol + mountPath: /mnt/old + - name: new-vol + mountPath: /mnt/new + volumes: + - name: old-vol + persistentVolumeClaim: + claimName: data-source-pvc # change to data source PVC + - name: new-vol + persistentVolumeClaim: + claimName: data-target-pvc # change to data target PVC +``` + + +# History +* Available since v1.0.1 [External provisioner](https://github.com/Longhorn/Longhorn/issues/1183) +* Available since v1.1.0 [Native RWX support](https://github.com/Longhorn/Longhorn/issues/1470) \ No newline at end of file diff --git a/content/docs/1.5.1/advanced-resources/security/_index.md b/content/docs/1.5.1/advanced-resources/security/_index.md new file mode 100644 index 000000000..b5fe30482 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/security/_index.md @@ -0,0 +1,4 @@ +--- +title: Security +weight: 6 +--- diff --git a/content/docs/1.5.1/advanced-resources/security/mtls-support.md b/content/docs/1.5.1/advanced-resources/security/mtls-support.md new file mode 100644 index 000000000..cfd115e66 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/security/mtls-support.md @@ -0,0 +1,59 @@ +--- +title: MTLS Support +weight: 6 +--- + +Longhorn supports MTLS to secure and encrypt the grpc communication between the control plane (longhorn-manager) and the data plane (instance-managers). +For Certificate setup we use the Kubernetes secret mechanism in combination with an optional secret mount for the longhorn-manager/instance-manager. + + +# Requirements +In a default installation mtls is disabled to enable mtls support one needs to create a `longhorn-grpc-tls` secret in the `longhorn-system` namespace before deployment. +The secret is specified as an optional secret mount for the longhorn-manager/instance-managers so if it does not exist when these +components are started, mtls will not be used and a restart of the components will be required to enable tls support. + +The longhorn-manager has a non tls client fallback for mixed mode setups where there are old instance-managers that were started without tls support. + +# Self Signed Certificate Setup + +You should create a `ca.crt` with the CA flag set which is then used to sign the `tls.crt` this will allow you to rotate the `tls.crt` in the future without service interruptions. +You can use [openssl](https://mariadb.com/docs/security/data-in-transit-encryption/create-self-signed-certificates-keys-openssl/) +or [cfssl](https://github.com/cloudflare/cfssl) for the `ca.crt` as well as `tls.crt` certificate generation. + +The `tls.crt` certificate should use `longhorn-backend` for the common name and the below list of entries for the Subject Alternative Name. +```text +Common Name: longhorn-backend +Subject Alternative Names: longhorn-backend, longhorn-backend.longhorn-system, longhorn-backend.longhorn-system.svc, longhorn-frontend, longhorn-frontend.longhorn-system, longhorn-frontend.longhorn-system.svc, longhorn-engine-manager, longhorn-engine-manager.longhorn-system, longhorn-engine-manager.longhorn-system.svc, longhorn-replica-manager, longhorn-replica-manager.longhorn-system, longhorn-replica-manager.longhorn-system.svc, longhorn-csi, longhorn-csi.longhorn-system, longhorn-csi.longhorn-system.svc, longhorn-backend, IP Address:127.0.0.1 +``` + +# Setting up Kubernetes Secrets + +The `ca.crt` is the certificate of the certificate authority that was used to sign +the `tls.crt` which will be used both by the client (longhorn-manager) and the server (instance-manager) for grpc mtls authentication. +The `tls.key` is associated private key for the created `tls.crt`. + +The `longhorn-grpc-tls` yaml looks like the below example, +If you are having trouble getting your own certificates to work you can base decode the below certificate +and compare it against your own generated certificates via `openssl x509 -in tls.crt -text -noout`. +```yaml +apiVersion: v1 +kind: Secret +metadata: + name: longhorn-grpc-tls + namespace: longhorn-system +type: kubernetes.io/tls +data: + ca.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUREakNDQWZhZ0F3SUJBZ0lVU2tNdUVEOC9XYXphNmpkb1NiTE1qalFqb3JFd0RRWUpLb1pJaHZjTkFRRUwKQlFBd0h6RWRNQnNHQTFVRUF4TVViRzl1WjJodmNtNHRaM0p3WXkxMGJITXRZMkV3SGhjTk1qSXdNVEV4TWpFeApPVEF3V2hjTk1qY3dNVEV3TWpFeE9UQXdXakFmTVIwd0d3WURWUVFERXhSc2IyNW5hRzl5YmkxbmNuQmpMWFJzCmN5MWpZVENDQVNJd0RRWUpLb1pJaHZjTkFRRUJCUUFEZ2dFUEFEQ0NBUW9DZ2dFQkFNY2grbTJhUndnNEtBa0EKT0xzdzdScWlWb1VqL2VPbVhuSE9HVE5nWE4rcFh5bDlCdzVDM1J4UDYzU29qaTVvNEhkU1htVmpwZmhmNjh1YwpvNVJJeUtXM1p6cndteDhXZldEc0dNNEtnYXBvMy84N3pVQ00vdGltOHllTzFUbTZlWVhXcWdlZ2JpM1Q1WnlvCmkzRjdteFg3QlU3Z25uWGthVmJ5UU1xRkEyMDJrK25jaVhaUE9iU0tlc1NvZ20wdWsrYXFvY3N1SjJ6dk9tZG0KMXd0a3ZTUklhL3l6T25JRGlmbFRteXNhZ3oxQy9VM1JxbzJ6TjIwbWJNYUJhMmx5anVZWkdWSnNyNGh4dGhqUApIR2x1UUh2QTlKTE9kc2J0T2xmbjRZNlZpUktCSzZWMVpOeVROMVJpN3ArTXZlaWQ3cE9rNHYweC9qVTc1a0N6Clo1cGJHbGtDQXdFQUFhTkNNRUF3RGdZRFZSMFBBUUgvQkFRREFnRUdNQThHQTFVZEV3RUIvd1FGTUFNQkFmOHcKSFFZRFZSME9CQllFRlBGc0xRbmQxOHFUTVd5djh1STk3Z2hnR2djR01BMEdDU3FHU0liM0RRRUJDd1VBQTRJQgpBUUNMcnk5a2xlSElMdDRwbzd4N0hvSldsMEswYjdwV2Y0Y3ZVeHh1bUdTYUpoQmFHNTVlZFNFSVAzajhsRGg1Cm94ZXJlbjNrRUtzeGZiQVQ0RzU3KzBaeExQSkZQcjFMM3JvcmxUVE1DS1QyY2Z1UDJ3SEIzZndWNDJpSHZSUDgKSUVqU041bFNkWjZnN1NjWFZ2RnpZNzlrbVZDQ2RNYlpGcEFuOElyTkh3L0tTUGZUajNob2VyV3ZGL3huaEo3bQpmSzUrcE5TeWR6QTA1K1Y0ODJhWGlvV2NWcWY2UHpSVndmT0tIalUrbUVDQXZMbitNSzRvN1l2VW1iN2tSUGs5CnBjU1A4N2lpN0hwRVhqZUtRaVJhZElXKzMySXp1UTFiOXRYc3BNTGF0UFA5TXNvWmY0M1EyZWw4bWd1RjRxOUcKVmVUZFZaU2hBNWNucmNRZTEySUs1MzAvCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K + tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUVqekNDQTNlZ0F3SUJBZ0lVUjZWcGR5U1Z0MGp6bDcwQnIxMmdZOTB0QVNBd0RRWUpLb1pJaHZjTkFRRUwKQlFBd0h6RWRNQnNHQTFVRUF4TVViRzl1WjJodmNtNHRaM0p3WXkxMGJITXRZMkV3SGhjTk1qSXdNVEV4TWpFeApPVEF3V2hjTk1qTXdNVEV4TWpFeE9UQXdXakFiTVJrd0Z3WURWUVFERXhCc2IyNW5hRzl5YmkxaVlXTnJaVzVrCk1Ga3dFd1lIS29aSXpqMENBUVlJS29aSXpqMERBUWNEUWdBRUxOQVVJZUsvdnppaGc3a1Q5d3E4anU4VU51c24Kc2FzdmlpS1VHQnpkblZndlNSdzNhYzd4RTRSQjlmZytjRnVUenpGaFNHRlVLVUpYaVh5d0FXZ0o4YU9DQXBBdwpnZ0tNTUE0R0ExVWREd0VCL3dRRUF3SUZvREFkQmdOVkhTVUVGakFVQmdnckJnRUZCUWNEQVFZSUt3WUJCUVVICkF3SXdEQVlEVlIwVEFRSC9CQUl3QURBZEJnTlZIUTRFRmdRVXgvaDVCOUFMSExuYWJaNjBzT2dvbnA3YlN0VXcKSHdZRFZSMGpCQmd3Rm9BVThXd3RDZDNYeXBNeGJLL3k0ajN1Q0dBYUJ3WXdnZ0lMQmdOVkhSRUVnZ0lDTUlJQgovb0lRYkc5dVoyaHZjbTR0WW1GamEyVnVaSUlnYkc5dVoyaHZjbTR0WW1GamEyVnVaQzVzYjI1bmFHOXliaTF6CmVYTjBaVzJDSkd4dmJtZG9iM0p1TFdKaFkydGxibVF1Ykc5dVoyaHZjbTR0YzNsemRHVnRMbk4yWTRJUmJHOXUKWjJodmNtNHRabkp2Ym5SbGJtU0NJV3h2Ym1kb2IzSnVMV1p5YjI1MFpXNWtMbXh2Ym1kb2IzSnVMWE41YzNSbApiWUlsYkc5dVoyaHZjbTR0Wm5KdmJuUmxibVF1Ykc5dVoyaHZjbTR0YzNsemRHVnRMbk4yWTRJWGJHOXVaMmh2CmNtNHRaVzVuYVc1bExXMWhibUZuWlhLQ0oyeHZibWRvYjNKdUxXVnVaMmx1WlMxdFlXNWhaMlZ5TG14dmJtZG8KYjNKdUxYTjVjM1JsYllJcmJHOXVaMmh2Y200dFpXNW5hVzVsTFcxaGJtRm5aWEl1Ykc5dVoyaHZjbTR0YzNsegpkR1Z0TG5OMlk0SVliRzl1WjJodmNtNHRjbVZ3YkdsallTMXRZVzVoWjJWeWdpaHNiMjVuYUc5eWJpMXlaWEJzCmFXTmhMVzFoYm1GblpYSXViRzl1WjJodmNtNHRjM2x6ZEdWdGdpeHNiMjVuYUc5eWJpMXlaWEJzYVdOaExXMWgKYm1GblpYSXViRzl1WjJodmNtNHRjM2x6ZEdWdExuTjJZNElNYkc5dVoyaHZjbTR0WTNOcGdoeHNiMjVuYUc5eQpiaTFqYzJrdWJHOXVaMmh2Y200dGMzbHpkR1Z0Z2lCc2IyNW5hRzl5YmkxamMya3ViRzl1WjJodmNtNHRjM2x6CmRHVnRMbk4yWTRJUWJHOXVaMmh2Y200dFltRmphMlZ1WkljRWZ3QUFBVEFOQmdrcWhraUc5dzBCQVFzRkFBT0MKQVFFQWV5UlhCWnI5Z1RmTGlsNGMvZElaSlVYeFh4ckFBQmtJTG55QkdNdkFqaFJoRndLZ09VU0MvMGUyeDYvTQpoTi9SWElYVzdBYUF0a25ZSHFLa3piMDZsbWhxczRHNWVjNkZRZDViSGdGbnFPOHNWNEF6WVFSRWhDZjlrWWhUClVlRnJLdDdOQllHNFNXSnNYK2M0ZzU5RlZGZkIzbTZscStoR3JaY085T2NIQ1NvVDM2SVRPeERDT3lrV002WHcKVW5zYWtaaHRwQ3lxdHlwQXZqaURNM3ZTY2txVTFNSWxLSnA1Z3lGT3k2VHVwQ01tYnRiWlRpSEtaN0ZlcmlmcwoyYng4Z0JmaldFQnEwMEhVWTdyY3RFNzFpVk11WURTczAwYTB2c1ZGQ240akppeWFnM0lHWkdud0FHQk1zR2h3ClFJcndjRHgwdy91NGR1VWRNMzBpaU1WZ0pnPT0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo= + tls.key: LS0tLS1CRUdJTiBFQyBQUklWQVRFIEtFWS0tLS0tCk1IY0NBUUVFSUwzbjZVZzlhZU1Day9XbkZ2L1pmSTlxMkIyakxnbjFRWGQwcjhIL3k2QkhvQW9HQ0NxR1NNNDkKQXdFSG9VUURRZ0FFTE5BVUllSy92emloZzdrVDl3cThqdThVTnVzbnNhc3ZpaUtVR0J6ZG5WZ3ZTUnczYWM3eApFNFJCOWZnK2NGdVR6ekZoU0dGVUtVSlhpWHl3QVdnSjhRPT0KLS0tLS1FTkQgRUMgUFJJVkFURSBLRVktLS0tLQo= +``` + +For more information on creating a secret, see [the Kubernetes documentation.](https://kubernetes.io/docs/concepts/configuration/secret/#creating-a-secret-manually) The secret must be created in the `longhorn-system` namespace for Longhorn to access it. + +> Note: Make sure to use `echo -n` when generating the base64 encoding, +> otherwise a new line will be added at the end of the string +> which will cause an error during loading of the certificates. + + +# History +Available since v1.3.0 [#3839](https://github.com/longhorn/longhorn/issues/3839) diff --git a/content/docs/1.5.1/advanced-resources/security/volume-encryption.md b/content/docs/1.5.1/advanced-resources/security/volume-encryption.md new file mode 100644 index 000000000..d6a1faadc --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/security/volume-encryption.md @@ -0,0 +1,107 @@ +--- +title: Volume Encryption +weight: 2 +--- + +Longhorn supports encrypted volumes by utilizing the linux kernel module `dm_crypt` via `cryptsetup` for the encryption. +Further we use the Kubernetes secret mechanism for key storage, which can be further encrypted and guarded via appropriate permissions. +An encrypted volume results in your data being encrypted while in transit as well as at rest, this also means that any backups taken from that volume are also encrypted. + +# Requirements + +To be able to use encrypted volumes, you will need to have the `dm_crypt` kernel module loaded +and `cryptsetup` installed on your worker nodes. + +# Setting up Kubernetes Secrets +Volume encryption utilizes Kubernetes secrets for encryption key storage. +To configure the secret that will be used for an encrypted volume, you will need to specify the secret as part of the parameters of a storage class. +This mechanism is provided by Kubernetes and allows the usage of some template parameters that will be resolved as part of volume creation. + +The template parameters can be useful in the case where you want to use a per volume secret or a group secret for a specific collection of volumes. +More information about the available template parameters can be found in the [Kubernetes documentation](https://kubernetes-csi.github.io/docs/secrets-and-credentials-storage-class.html). + +Example secret your encryption keys are specified as part of the `CRYPTO_KEY_VALUE` parameter. +We use `stringData` as type here so we don't have to base64 encoded before submitting the secret via `kubectl create`. + +Besides `CRYPTO_KEY_VALUE`, parameters `CRYPTO_KEY_CIPHER`, `CRYPTO_KEY_HASH`, `CRYPTO_KEY_SIZE`, and `CRYPTO_PBKDF` provide the customization for volume encryption. +- `CRYPTO_KEY_CIPHER`: Sets the cipher specification algorithm string. The default value is `aes-xts-plain64` for LUKS. +- `CRYPTO_KEY_HASH`: Specifies the passphrase hash for `open`. The default value is `sha256`. +- `CRYPTO_KEY_SIZE`: Sets the key size in bits and it must be a multiple of 8. The default value is `256`. +- `CRYPTO_PBKDF`: Sets Password-Based Key Derivation Function (PBKDF) algorithm for LUKS keyslot. The default value is `argon2i`. + +For more details, you can see the Linux manual page - [crypsetup(8)](https://man7.org/linux/man-pages/man8/cryptsetup.8.html) +```yaml +--- +apiVersion: v1 +kind: Secret +metadata: + name: longhorn-crypto + namespace: longhorn-system +stringData: + CRYPTO_KEY_VALUE: "Your encryption passphrase" + CRYPTO_KEY_PROVIDER: "secret" + CRYPTO_KEY_CIPHER: "aes-xts-plain64" + CRYPTO_KEY_HASH: "sha256" + CRYPTO_KEY_SIZE: "256" + CRYPTO_PBKDF: "argon2i" +``` + +Example storage class (global key for all volumes) +```yaml +kind: StorageClass +apiVersion: storage.k8s.io/v1 +metadata: + name: longhorn-crypto-global +provisioner: driver.longhorn.io +allowVolumeExpansion: true +parameters: + numberOfReplicas: "3" + staleReplicaTimeout: "2880" # 48 hours in minutes + fromBackup: "" + encrypted: "true" + # global secret that contains the encryption key that will be used for all volumes + csi.storage.k8s.io/provisioner-secret-name: "longhorn-crypto" + csi.storage.k8s.io/provisioner-secret-namespace: "longhorn-system" + csi.storage.k8s.io/node-publish-secret-name: "longhorn-crypto" + csi.storage.k8s.io/node-publish-secret-namespace: "longhorn-system" + csi.storage.k8s.io/node-stage-secret-name: "longhorn-crypto" + csi.storage.k8s.io/node-stage-secret-namespace: "longhorn-system" +``` + +Example storage class (per volume key) +```yaml +kind: StorageClass +apiVersion: storage.k8s.io/v1 +metadata: + name: longhorn-crypto-per-volume +provisioner: driver.longhorn.io +allowVolumeExpansion: true +parameters: + numberOfReplicas: "3" + staleReplicaTimeout: "2880" # 48 hours in minutes + fromBackup: "" + encrypted: "true" + # per volume secret which utilizes the `pvc.name` and `pvc.namespace` template parameters + csi.storage.k8s.io/provisioner-secret-name: ${pvc.name} + csi.storage.k8s.io/provisioner-secret-namespace: ${pvc.namespace} + csi.storage.k8s.io/node-publish-secret-name: ${pvc.name} + csi.storage.k8s.io/node-publish-secret-namespace: ${pvc.namespace} + csi.storage.k8s.io/node-stage-secret-name: ${pvc.name} + csi.storage.k8s.io/node-stage-secret-namespace: ${pvc.namespace} +``` + +# Using an encrypted volume + +To create an encrypted volume, you just create a PVC utilizing a storage class that has been configured for encryption. +The above storage class examples can be used as a starting point. + +After creation of the PVC it will remain in `Pending` state till the associated secret has been created and can be retrieved +by the csi `external-provisioner` sidecar. Afterwards the regular volume creation flow will take over and the encryption will be +transparently used so no additional actions are needed from the user. + +# Filesystem expansion + +Longhorn supports offline [expansion](../../../volumes-and-nodes/expansion) for encrypted volumes. + +# History +Available since v1.2.0 [#1859](https://github.com/longhorn/longhorn/issues/1859) diff --git a/content/docs/1.5.1/advanced-resources/snapshot-data-integrity-check.md b/content/docs/1.5.1/advanced-resources/snapshot-data-integrity-check.md new file mode 100644 index 000000000..f74e3c483 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/snapshot-data-integrity-check.md @@ -0,0 +1,62 @@ +--- +title: Snapshot Data Integrity Check +weight: 4 +--- + +Longhorn is capable of hashing snapshot disk files and periodically checking their integrity. + +## Introduction + +Longhorn system supports volume snapshotting and stores the snapshot disk files on the local disk. However, it is impossible to check the data integrity of snapshots due to the lack of the checksums of the snapshots previously. As a result, when the data is corrupted due to, for example, the bit rot in the underlying storage, there is no way to detect the corruption and repair the replicas. After applying the feature, Longhorn is capable of hashing snapshot disk files and periodically checking their integrity. When a snapshot disk file in one replica is corrupted, Longhorn will automatically start the rebuilding process to fix it. + +## Settings + +### Global Settings + +- **snapshot-data-integrity**
+ + This setting allows users to enable or disable snapshot hashing and data integrity checking. Available options are: + + - **disabled**: Disable snapshot disk file hashing and data integrity checking. + - **enabled**: Enables periodic snapshot disk file hashing and data integrity checking. To detect the filesystem-unaware corruption caused by bit rot or other issues in snapshot disk files, Longhorn system periodically hashes files and finds corrupted ones. Hence, the system performance will be impacted during the periodical checking. + - **fast-check**: Enable snapshot disk file hashing and fast data integrity checking. Longhorn system only hashes snapshot disk files if their are not hashed or the modification time are changed. In this mode, filesystem-unaware corruption cannot be detected, but the impact on system performance can be minimized. + +- **snapshot-data-integrity-immediate-check-after-snapshot-creation**
+ + Hashing snapshot disk files impacts the performance of the system. The immediate snapshot hashing and checking can be disabled to minimize the impact after creating a snapshot. + +- **snapshot-data-integrity-cronjob**
+ + A schedule defined using the unix-cron string format specifies when Longhorn checks the data integrity of snapshot disk files. + + > **Warning** + > Hashing snapshot disk files impacts the performance of the system. It is recommended to run data integrity checks during off-peak times and to reduce the frequency of checks. + +### Per-Volume Settings + +Longhorn also supports the per-volume setting by configuring `Volume.Spec.SnapshotDataIntegrity`. The value is `ignored` by default, so data integrity check is determined by the global setting `snapshot-data-integrity`. `Volume.Spec.SnapshotDataIntegrity` supports `ignored`, `disabled`, `enabled` and `fast-check`. Each volume can have its data integrity check setting customized. + +## Performance Impact + +For detecting data corruption, checksums of snapshot disk files need to be calculated. The calculations consume storage and computation resources. Therefore, the storage performance will be negatively impacted. In order to provide a clear understanding of the impact, we benchmarked storage performance when checksumming disk files. The read IOPS, bandwidth and latency are negatively impacted. + +- Environment + - Host: AWS EC2 c5d.2xlarge + - CPU: Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz + - Memory: 16 GB + - Network: Up to 10Gbps + - Kubernetes: v1.24.4+rke2r1 +- Result + - Disk: 200 GiB NVMe SSD as the instance store + - 100 GiB snapshot with full random data + {{< figure src="/img/diagrams/snapshot/snapshot_hash_ssd_perf.png" >}} + + - Disk: 200 GiB throughput optimized HDD (st1) + - 30 GiB snapshot with full random data + {{< figure src="/img/diagrams/snapshot/snapshot_hash_hdd_perf.png" >}} + +## Recommendation + +The feature helps detect the data corruption in snapshot disk files of volumes. However, the checksum calculation negatively impacts the storage performance. To lower down the impact, the recommendations are +- Checksumming and checking snapshot disk files can be scheduled to off-peak hours by the global setting `snapshot-data-integrity-cronjob`. +- Disable the global setting `snapshot-data-integrity-immediate-check-after-snapshot-creation`. \ No newline at end of file diff --git a/content/docs/1.5.1/advanced-resources/support-bundle.md b/content/docs/1.5.1/advanced-resources/support-bundle.md new file mode 100644 index 000000000..a45115940 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/support-bundle.md @@ -0,0 +1,25 @@ +--- +title: Support Bundle +weight: 8 +--- + +Since v1.4.0, Longhorn replaced the in-house support bundle generation with a general-purpose [support bundle kit](https://github.com/rancher/support-bundle-kit). + +You can click the `Generate Support Bundle` at the bottom of Longhorn UI to download a zip file containing cluster manifests and logs. + +During support bundle generation, Longhorn will create a Deployment for the support bundle manager. + +> **Note:** The support bundle manager will use a dedicated `longhorn-support-bundle` service account and `longhorn-support-bundle` cluster role binding with `cluster-admin` access for bundle collection. + +With the support bundle, you can simulate a mocked Kubernetes cluster that is interactable with the `kubectl` command. See [simulator command](https://github.com/rancher/support-bundle-kit#simulator-command) for more details. + + +## Limitations + +Longhorn currently does not support concurrent generation of multiple support bundles. We recommend waiting until the completion of the ongoing support bundle before initiating a new one. If a new support bundle is created while another one is still in progress, Longhorn will overwrite the older support bundle. + + +## History +[Original Feature Request](https://github.com/longhorn/longhorn/issues/2759) + +Available since v1.4.0 diff --git a/content/docs/1.5.1/advanced-resources/support-managed-k8s-service/_index.md b/content/docs/1.5.1/advanced-resources/support-managed-k8s-service/_index.md new file mode 100644 index 000000000..007953047 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/support-managed-k8s-service/_index.md @@ -0,0 +1,5 @@ +--- +title: Support Managed Kubernetes Service +weight: 5 +--- + diff --git a/content/docs/1.5.1/advanced-resources/support-managed-k8s-service/manage-node-group-on-aks.md b/content/docs/1.5.1/advanced-resources/support-managed-k8s-service/manage-node-group-on-aks.md new file mode 100644 index 000000000..70f85e8b1 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/support-managed-k8s-service/manage-node-group-on-aks.md @@ -0,0 +1,55 @@ +--- +title: Manage Node-Group on Azure AKS +weight: 2 +--- + +See [Create and manage multiple node pools for a cluster in Azure Kubernetes Service (AKS)](https://docs.microsoft.com/en-us/azure/aks/use-multiple-node-pools) for more information. + +Following is an example to replace cluster nodes with a new storage size. + + +## Storage Expansion + +AKS does not support additional disk in its [template](https://docs.microsoft.com/en-us/azure/templates/Microsoft.ContainerService/2022-01-01/managedclusters?tabs=bicep#template-format). It is possible for manual disk attachment. Then raw device needs to be mounted either by manually mounting in VM or during launch with CustomScriptExtension that [is not supported](https://docs.microsoft.com/en-us/azure/aks/support-policies#user-customization-of-agent-nodes) in AKS. + +1. In Longhorn, set `replica-replenishment-wait-interval` to `0`. + +2. Add a new node-pool. Later Longhorn components will be automatically deployed on the nodes in this pool. + + ``` + AKS_NODEPOOL_NAME_NEW= + AKS_RESOURCE_GROUP= + AKS_CLUSTER_NAME= + AKS_DISK_SIZE_NEW= + AKS_NODE_NUM= + AKS_K8S_VERSION= + + az aks nodepool add \ + --resource-group ${AKS_RESOURCE_GROUP} \ + --cluster-name ${AKS_CLUSTER_NAME} \ + --name ${AKS_NODEPOOL_NAME_NEW} \ + --node-count ${AKS_NODE_NUM} \ + --node-osdisk-size ${AKS_DISK_SIZE_NEW} \ + --kubernetes-version ${AKS_K8S_VERSION} \ + --mode System + ``` + +3. Using Longhorn UI to disable the disk scheduling and request eviction for nodes in the old node-pool. + +4. Cordon and drain Kubernetes nodes in the old node-pool. + ``` + AKS_NODEPOOL_NAME_OLD= + + for n in `kubectl get nodes | grep ${AKS_NODEPOOL_NAME_OLD}- | awk '{print $1}'`; do + kubectl cordon $n && \ + kubectl drain $n --ignore-daemonsets --delete-emptydir-data + done + ``` + +5. Delete old node-pool. + ``` + az aks nodepool delete \ + --cluster-name ${AKS_CLUSTER_NAME} \ + --name ${AKS_NODEPOOL_NAME_OLD} \ + --resource-group ${AKS_RESOURCE_GROUP} + ``` diff --git a/content/docs/1.5.1/advanced-resources/support-managed-k8s-service/manage-node-group-on-eks.md b/content/docs/1.5.1/advanced-resources/support-managed-k8s-service/manage-node-group-on-eks.md new file mode 100644 index 000000000..c4734a5f4 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/support-managed-k8s-service/manage-node-group-on-eks.md @@ -0,0 +1,63 @@ +--- +title: Manage Node-Group on AWS EKS +weight: 1 +--- + +EKS supports configuring the same launch template. The nodes in the node-group will be recycled by new nodes with new configurations when updating the launch template version. + +See [Launch template support](https://docs.aws.amazon.com/eks/latest/userguide/launch-templates.html) for more information. + +The following is an example to replace cluster nodes with new storage size. + + +## Storage Expansion + +1. In Longhorn, set `replica-replenishment-wait-interval` to `0`. + +2. Go to the launch template of the EKS cluster node-group. You can find in the EKS cluster tab `Configuration/Compute/` and click the launch template. + +3. Click `Modify template (Create new version)` in the `Actions` drop-down menu. + +4. Choose the `Source template version` in the `Launch template name and version description`. + +5. Follow steps to [Expand volume](#expand-volume), or [Create additional volume](#create-additional-volume). +> **Note:** If you choose to expand by [create additional volume](#create-additional-volume), the disks need to be manually added to the disk list of the nodes after the EKS cluster upgrade. + + +### Expand volume +1. Update the volume size in `Configure storage`. + +2. Click `Create template version` to save changes. + +3. Go to the EKS cluster node-group and change `Launch template version` in `Node Group configuration`. Track the status in the `Update history` tab. + + +### Create additional volume +1. Click `Advanced` then `Add new volume` in `Configure storage` and fill in the fields. + +2. Adjust the auto-mount script and add to `User data` in `Advanced details`. Make sure the `DEV_PATH` matches the `Device name` of the additional volume. + ``` + MIME-Version: 1.0 + Content-Type: multipart/mixed; boundary="==MYBOUNDARY==" + + --==MYBOUNDARY== + Content-Type: text/x-shellscript; charset="us-ascii" + + #!/bin/bash + + # https://docs.aws.amazon.com/eks/latest/userguide/launch-templates.html#launch-template-user-data + echo "Running custom user data script" + + DEV_PATH="/dev/sdb" + mkfs -t ext4 ${DEV_PATH} + + MOUNT_PATH="/mnt/longhorn" + mkdir ${MOUNT_PATH} + mount ${DEV_PATH} ${MOUNT_PATH} + ``` + +3. Click `Create template version` to save changes. + +4. Go to the EKS cluster node-group and change `Launch template version` in `Node Group configuration`. Track the status in the `Update history` tab. + +5. In Longhorn, add the path of the mounted disk into the disk list of the nodes. diff --git a/content/docs/1.5.1/advanced-resources/support-managed-k8s-service/manage-node-group-on-gke.md b/content/docs/1.5.1/advanced-resources/support-managed-k8s-service/manage-node-group-on-gke.md new file mode 100644 index 000000000..f240a2447 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/support-managed-k8s-service/manage-node-group-on-gke.md @@ -0,0 +1,57 @@ +--- +title: Manage Node-Group on GCP GKE +weight: 3 +--- + +See [Migrating workloads to different machine types](https://cloud.google.com/kubernetes-engine/docs/tutorials/migrating-node-pool) for more information. + +The following is an example to replace cluster nodes with new storage size. + + +## Storage Expansion + +GKE supports adding additional disk with `local-ssd-count`. However, each local SSD is fixed size to 375 GB. We suggest expanding the node size via node pool replacement. + +1. In Longhorn, set `replica-replenishment-wait-interval` to `0`. + +2. Add a new node-pool. Later Longhorn components will be automatically deployed on the nodes in this pool. + + ``` + GKE_NODEPOOL_NAME_NEW= + GKE_REGION= + GKE_CLUSTER_NAME= + GKE_IMAGE_TYPE=Ubuntu + GKE_MACHINE_TYPE= + GKE_DISK_SIZE_NEW= + GKE_NODE_NUM= + + gcloud container node-pools create ${GKE_NODEPOOL_NAME_NEW} \ + --region ${GKE_REGION} \ + --cluster ${GKE_CLUSTER_NAME} \ + --image-type ${GKE_IMAGE_TYPE} \ + --machine-type ${GKE_MACHINE_TYPE} \ + --disk-size ${GKE_DISK_SIZE_NEW} \ + --num-nodes ${GKE_NODE_NUM} + + gcloud container node-pools list \ + --zone ${GKE_REGION} \ + --cluster ${GKE_CLUSTER_NAME} + ``` + +3. Using Longhorn UI to disable the disk scheduling and request eviction for nodes in the old node-pool. + +4. Cordon and drain Kubernetes nodes in the old node-pool. + ``` + GKE_NODEPOOL_NAME_OLD= + for n in `kubectl get nodes | grep ${GKE_CLUSTER_NAME}-${GKE_NODEPOOL_NAME_OLD}- | awk '{print $1}'`; do + kubectl cordon $n && \ + kubectl drain $n --ignore-daemonsets --delete-emptydir-data + done + ``` + +5. Delete old node-pool. + ``` + gcloud container node-pools delete ${GKE_NODEPOOL_NAME_OLD}\ + --zone ${GKE_REGION} \ + --cluster ${GKE_CLUSTER_NAME} + ``` diff --git a/content/docs/1.5.1/advanced-resources/support-managed-k8s-service/upgrade-k8s-on-aks.md b/content/docs/1.5.1/advanced-resources/support-managed-k8s-service/upgrade-k8s-on-aks.md new file mode 100644 index 000000000..66e8202d2 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/support-managed-k8s-service/upgrade-k8s-on-aks.md @@ -0,0 +1,60 @@ +--- +title: Upgrade Kubernetes on Azure AKS +weight: 5 +--- + +AKS provides `az aks upgrade` for in-places nodes upgrade by node reimaged, but this will cause the original Longhorn disks missing, then there will be no disks allowing replica rebuilding in upgraded nodes anymore. + +We suggest using node-pool replacement to upgrade the agent nodes but use `az aks upgrade` for control plane nodes to ensure data safety. + +1. In Longhorn, set `replica-replenishment-wait-interval` to `0`. + +2. Upgrade AKS control plane. + ``` + AKS_RESOURCE_GROUP= + AKS_CLUSTER_NAME= + AKS_K8S_VERSION_UPGRADE= + + az aks upgrade \ + --resource-group ${AKS_RESOURCE_GROUP} \ + --name ${AKS_CLUSTER_NAME} \ + --kubernetes-version ${AKS_K8S_VERSION_UPGRADE} \ + --control-plane-only + ``` + +3. Add a new node-pool. + + ``` + AKS_NODEPOOL_NAME_NEW= + AKS_DISK_SIZE= + AKS_NODE_NUM= + + az aks nodepool add \ + --resource-group ${AKS_RESOURCE_GROUP} \ + --cluster-name ${AKS_CLUSTER_NAME} \ + --name ${AKS_NODEPOOL_NAME_NEW} \ + --node-count ${AKS_NODE_NUM} \ + --node-osdisk-size ${AKS_DISK_SIZE} \ + --kubernetes-version ${AKS_K8S_VERSION_UPGRADE} \ + --mode System + ``` + +4. Using Longhorn UI to disable the disk scheduling and request eviction for nodes in the old node-pool. + +5. Cordon and drain Kubernetes nodes in the old node-pool. + ``` + AKS_NODEPOOL_NAME_OLD= + + for n in `kubectl get nodes | grep ${AKS_NODEPOOL_NAME_OLD}- | awk '{print $1}'`; do + kubectl cordon $n && \ + kubectl drain $n --ignore-daemonsets --delete-emptydir-data + done + ``` + +6. Delete old node-pool. + ``` + az aks nodepool delete \ + --cluster-name ${AKS_CLUSTER_NAME} \ + --name ${AKS_NODEPOOL_NAME_OLD} \ + --resource-group ${AKS_RESOURCE_GROUP} + ``` diff --git a/content/docs/1.5.1/advanced-resources/support-managed-k8s-service/upgrade-k8s-on-eks.md b/content/docs/1.5.1/advanced-resources/support-managed-k8s-service/upgrade-k8s-on-eks.md new file mode 100644 index 000000000..540b5caef --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/support-managed-k8s-service/upgrade-k8s-on-eks.md @@ -0,0 +1,10 @@ +--- +title: Upgrade Kubernetes on AWS EKS +weight: 4 +--- + +In Longhorn, set `replica-replenishment-wait-interval` to `0`. + +See [Updating a cluster](https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html) for instructions. + +> **Note:** If you have created [addition disks](../manage-node-group-on-eks#create-additional-volume) for Longhorn, you will need to manually add the path of the mounted disk into the disk list of the upgraded nodes. \ No newline at end of file diff --git a/content/docs/1.5.1/advanced-resources/support-managed-k8s-service/upgrade-k8s-on-gke.md b/content/docs/1.5.1/advanced-resources/support-managed-k8s-service/upgrade-k8s-on-gke.md new file mode 100644 index 000000000..7fa9f191a --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/support-managed-k8s-service/upgrade-k8s-on-gke.md @@ -0,0 +1,8 @@ +--- +title: Upgrade Kubernetes on GCP GKE +weight: 6 +--- + +In Longhorn, set `replica-replenishment-wait-interval` to `0`. + +See [Upgrading the cluster](https://cloud.google.com/kubernetes-engine/docs/how-to/upgrading-a-cluster#upgrading_the_cluster) and [Upgrading node pools](https://cloud.google.com/kubernetes-engine/docs/how-to/upgrading-a-cluster#upgrading-nodes) for instructions. diff --git a/content/docs/1.5.1/advanced-resources/system-backup-restore/_index.md b/content/docs/1.5.1/advanced-resources/system-backup-restore/_index.md new file mode 100644 index 000000000..9f8fcf5cb --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/system-backup-restore/_index.md @@ -0,0 +1,16 @@ +--- +title: Longhorn System Backup And Restore +weight: 4 +--- + +> Before v1.4.0, you can restore Longhorn with third-party tools. + +- [Restore to a cluster contains data using Rancher snapshot](./restore-to-a-cluster-contains-data-using-rancher-snapshot) +- [Restore to a new cluster using Velero](./restore-to-a-new-cluster-using-velero) + +> Since v1.4.0, Longhorn introduced out-of-the-box Longhorn system backup and restore. +> - Longhorn's custom resources will be backed up and bundled into a single system backup file, then saved to the remote backup target. +> - Later, you can choose a system backup to restore to a new cluster or restore to an existing cluster. + +- [Backup Longhorn system](./backup-longhorn-system) +- [Restore Longhorn system](./restore-longhorn-system) diff --git a/content/docs/1.5.1/advanced-resources/system-backup-restore/backup-longhorn-system.md b/content/docs/1.5.1/advanced-resources/system-backup-restore/backup-longhorn-system.md new file mode 100644 index 000000000..a6ef39006 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/system-backup-restore/backup-longhorn-system.md @@ -0,0 +1,123 @@ +--- +title: Backup Longhorn System +weight: 1 +--- + +- [What is in the Longhorn system backup bundle](#longhorn-system-backup-bundle) +- [How to create a Longhorn system backup](#create-longhorn-system-backup) + - [Prerequisite](#prerequisite) + - [Configuration](#configuration) + - [Using Longhorn UI](#using-longhorn-ui) + - [Using kubectl command](#using-kubectl-command) +- [How to delete Longhorn system backup](#delete-longhorn-system-backup) + - [Using Longhorn UI](#using-longhorn-ui-1) + - [Using kubectl command](#using-kubectl-command-1) +- [History](#history) + +## Longhorn System Backup Bundle + +Longhorn system backup creates a resource bundle and uploads it to the remote backup target. + +It includes below resources associating with the Longhorn system: +- ClusterRoles +- ClusterRoleBindings +- ConfigMaps +- CustomResourceDefinitions +- DaemonSets +- Deployments +- EngineImages +- PersistentVolumes +- PersistentVolumeClaims +- PodSecurityPolicies +- RecurringJobs +- Roles +- RoleBindings +- Settings +- Services +- ServiceAccounts +- StorageClasses +- Volumes + +> **Warning:** Longhorn does not backup `BackingImages`. We will improve this part in the future. See [Restore Longhorn System - Prerequisite](../restore-longhorn-system/#prerequisite) for restoring volumes created with the backing image. + +> **Note:** Longhorn does not backup `Nodes`. The Longhorn manager on the target cluster is responsible for creating its own Longhorn `Node` custom resources. + +> **Note:** Longhorn system backup bundle only includes resources operated by Longhorn. +> Here is an example of a cluster workload with a bare `Pod` workload. The system backup will collect the `PersistentVolumeClaim`, `PersistentVolume`, and `Volume`. The system backup will exclude the `Pod` during system backup resource collection. + +## Create Longhorn System Backup + +You can create a Longhorn system backup using the Longhorn UI. Or with the `kubectl` command. + +### Prerequisite + +- [Set the backup target](../../../snapshots-and-backups/backup-and-restore/set-backup-target). Longhorn saves the system backups to the remote backup store. You will see an error during creation when the backup target is unset. + + > **Note:** Unsetting the backup target clears the existing `SystemBackup` custom resource. Longhorn syncs to the remote backup store after setting the backup target. Another cluster can also sync to the same list of system backups when the backup target is the same. + +- Create a backup for all volumes (optional). + + > **Note:** Longhorn system restores volume with the latest backup. We recommend updating the last backup for all volumes. By taking volume backups, you ensure that the data is up-to-date with the system backup. For more information, please refer to the [Configuration - Volume Backup Policy](#volume-backup-policy) section. + +### Configuration + +#### Volume Backup Policy +The Longhorn system backup offers the following volume backup policies: + - `if-not-present`: Longhorn will create a backup for volumes that currently lack a backup. + - `always`: Longhorn will create a backup for all volumes, regardless of their existing backups. + - `disabled`: Longhorn will not create any backups for volumes. + +### Using Longhorn UI + +1. Go to the `System Backup` page in the `Setting` drop-down list. +1. Click `Create` under `System Backup`. +1. Give a `Name` for the system backup. +1. Select a `Volume Backup Policy` for the system backup. +1. The system backup will be ready to use when the state changes to `Ready`. + +### Using `kubectl` Command + +1. Execute `kubectl create` to create a Longhorn `SystemBackup` custom resource. + ```yaml + apiVersion: longhorn.io/v1beta2 + kind: SystemBackup + metadata: + name: demo + namespace: longhorn-system + spec: + volumeBackupPolicy: if-not-present + ``` +1. The system backup will be ready to use when the state changes to `Ready`. + ``` + > kubectl -n longhorn-system get systembackup + NAME VERSION STATE CREATED + demo v1.4.0 Ready 2022-11-24T04:23:24Z + ``` + +## Delete Longhorn System Backup + +You can delete the Longhorn system backup in the remote backup target using the Longhorn UI. Or with the `kubectl` command. + +### Using Longhorn UI + +1. Go to the `System Backup` page in the `Setting` drop-down list. +1. Delete a single system backup in the `Operation` drop-down menu next to the system backup. Or delete in batch with the `Delete` button. + + > **Note:** Deleting the system backup will also make a deletion in the backup store. + +### Using `kubectl` Command + +1. Execute `kubectl delete` to delete a Longhorn `SystemBackup` custom resource. + ``` + > kubectl -n longhorn-system get systembackup + NAME VERSION STATE CREATED + demo v1.4.0 Ready 2022-11-24T04:23:24Z + + > kubectl -n longhorn-system delete systembackup/demo + systembackup.longhorn.io "demo" deleted + ``` + +## History +[Original Feature Request](https://github.com/longhorn/longhorn/issues/1455) + +Available since v1.4.0 \ No newline at end of file diff --git a/content/docs/1.5.1/advanced-resources/system-backup-restore/restore-longhorn-system.md b/content/docs/1.5.1/advanced-resources/system-backup-restore/restore-longhorn-system.md new file mode 100644 index 000000000..87cb66181 --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/system-backup-restore/restore-longhorn-system.md @@ -0,0 +1,134 @@ +--- +title: Restore Longhorn System +weight: 2 +--- + +- [What does the Longhorn system restore rollout to the cluster](#longhorn-system-restore-rollouts) +- [What are the limitations](#limitations) + - [Restore Path](#restore-path) +- [How to restore from Longhorn system backup](#create-longhorn-system-restore) + - [Prerequisite](#prerequisite) + - [Using Longhorn UI](#using-longhorn-ui) + - [Using kubectl command](#using-kubectl-command) +- [How to delete Longhorn system restore](#delete-longhorn-system-restore) + - [Using Longhorn UI](#using-longhorn-ui-1) + - [Using kubectl command](#using-kubectl-command-1) +- [How to restart Longhorn System Restore](#restart-longhorn-system-restore) +- [What settings are configurable](#configurable-settings) +- [How to troubleshoot](#troubleshoot) +- [History](#history) + +## Longhorn System Restore Rollouts + +- Longhorn restores the resource from the [Longhorn System Backup Bundle](../backup-longhorn-system#longhorn-system-backup-bundle). +- Longhorn does not restore existing `Volumes` and their associated `PersistentVolume` and `PersistentVolumeClaim`. +- Longhorn automatically restores a `Volume` from its latest backup. +- To prevent overwriting eligible settings, Longhorn does not restore the `ConfigMap/longhorn-default-setting`. +- Longhorn does not restore [configurable settings](#configurable-settings). + +## Limitations +### Restore Path + +Longhorn does not support cross-major/minor version system restore except for upgrade failures, ex: 1.4.x -> 1.5. +## Create Longhorn System Restore + +You can restore the Longhorn system using Longhorn UI. Or with the `kubectl` command. + +### Prerequisite + +- A running Longhorn cluster for Longhorn to roll out the resources in the system backup bundle. +- Set up the `Nodes` and disk tags for `StorageClass`. +- Have a Longhorn system backup. + + See [Backup Longhorn System - Create Longhorn System Backup](../backup-longhorn-system#create-longhorn-system-backup) for instructions. +- Have volume `BackingImages` available in the cluster. + + In case of the `BackingImage` absence, Longhorn will skip the restoration for that `Volume` and its `PersistentVolume` and `PersistentVolumeClaim`. +- All existing `Volumes` are detached. + +### Using Longhorn UI + +1. Go to the `System Backup` page in the `Setting`. +1. Select a system backup to restore. +1. Click `Restore` in the `Operation` drop-down menu. +1. Give a `Name` for the system restore. +1. The system restore starts and show the `Completed` state when done. + +## Using `kubectl` Command + +1. Find the Longhorn `SystemBackup` to restore. + ``` + > kubectl -n longhorn-system get systembackup + NAME VERSION STATE CREATED + demo v1.4.0 Ready 2022-11-24T04:23:24Z + demo-2 v1.4.0 Ready 2022-11-24T05:00:59Z + ``` +1. Execute `kubectl create` to create a Longhorn `SystemRestore` of the `SystemBackup`. + ```yaml + apiVersion: longhorn.io/v1beta2 + kind: SystemRestore + metadata: + name: restore-demo + namespace: longhorn-system + spec: + systemBackup: demo + ``` +1. The system restore starts. +1. The `SystemRestore` change to state `Completed` when done. + ``` + > kubectl -n longhorn-system get systemrestore + NAME STATE AGE + restore-demo Completed 59s + ``` + +## Delete Longhorn System Restore + +> **Warning:** Deleting the SystemRestore also deletes the associated job and will abort the remaining resource rollouts. You can [Restart the Longhorn System Restore](#restart-longhorn-system-restore) to roll out the remaining resources. + +You can abort or remove a completed Longhorn system restore using Longhorn UI. Or with the `kubectl` command. + +### Using Longhorn UI + +1. Go to the `System Backup` page in the `Setting`. +1. Delete a single system restore in the `Operation` drop-down menu next to the system restore. Or delete in batch with the `Delete` button. + +### Using `kubectl` Command + +1. Execute `kubectl delete` to delete a Longhorn `SystemRestore`. + ``` + > kubectl -n longhorn-system get systemrestore + NAME STATE AGE + restore-demo Completed 2m37s + + > kubectl -n longhorn-system delete systemrestore/restore-demo + systemrestore.longhorn.io "restore-demo" deleted + ``` + +## Restart Longhorn System Restore + +1. [Delete Longhorn System Restore](#delete-longhorn-system-restore) that is in progress. +1. [Create Longhorn System Restore](#create-longhorn-system-restore). + +## Configurable Settings + +Some settings are excluded as configurable before the Longhorn system restore. +- [Concurrent volume backup restore per node limit](../../../references/settings/#concurrent-volume-backup-restore-per-node-limit) +- [Concurrent replica rebuild per node limit](../../../references/settings/#concurrent-replica-rebuild-per-node-limit) +- [Backup Target](../../../references/settings/#backup-target) +- [Backup Target Credential Secret](../../../references/settings/#backup-target-credential-secret) + +## Troubleshoot + +### System Restore Hangs + +1. Check the longhorn-system-rollout Pod log for any errors. +``` +> kubectl -n longhorn-system logs --selector=job-name=longhorn-system-rollout- +``` +1. Resolve if the issue is identifiable, ex: remove the problematic restoring resource. +1. [Restart the Longhorn system restore](#restart-longhorn-system-restore). + +## History +[Original Feature Request](https://github.com/longhorn/longhorn/issues/1455) + +Available since v1.4.0 diff --git a/content/docs/1.5.1/advanced-resources/system-backup-restore/restore-to-a-cluster-contains-data-using-Rancher-snapshot.md b/content/docs/1.5.1/advanced-resources/system-backup-restore/restore-to-a-cluster-contains-data-using-Rancher-snapshot.md new file mode 100644 index 000000000..6c512923a --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/system-backup-restore/restore-to-a-cluster-contains-data-using-Rancher-snapshot.md @@ -0,0 +1,54 @@ +--- +title: Restore to a cluster contains data using Rancher snapshot +weight: 4 +--- + +This doc describes what users need to do after restoring the cluster with a Rancher snapshot. + +## Assumptions: +- **Most of the data and the underlying disks still exist** in the cluster before the restore and can be directly reused then. +- There is a backupstore holding all volume data. +- The setting [`Disable Revision Counter`](../../../references/settings/#disable-revision-counter) is false. (It's false by default.) Otherwise, users need to manually check if the data among volume replicas are consistent, or directly restore volumes from backup. + +## Expectation: +- All settings and node & disk configs will be restored. +- As long as the valid data still exists, the volumes can be recovered without using a backup. In other words, we will try to avoid restoring backups, which may help reduce Recovery Time Objective (RTO) as well as save bandwidth. +- Detect the invalid or out-of-sync replicas as long as the related volume still contains a valid replica after the restore. + +## Behaviors & Requirement of Rancher restore +- According to [the Rancher restore article](https://rancher.com/blog/2018/2018-05-30-recover-rancher-kubernetes-cluster-from-backup/), you have to restart the Kubernetes components on all nodes. Otherwise, there will be tons of resource update conflicts in Longhorn. + +## Actions after the restore +- Restart all Kubernetes components for all nodes. See the above link for more details. + +- Kill all longhorn manager pods then Kubernetes will automatically restart them. Wait for conflicts in longhorn manager pods to disappear. + +- All volumes may be reattached. If a Longhorn volume is used by a single pod, users need to shut down then recreate it. For Deployments or Statefulsets, Longhorn will automatically kill then restart the related pods. + +- If the following happens after the snapshot and before the cluster restore: + - A volume is unchanged: Users don't need to do anything. + - The data is updated: Users don't need to do anything typically. Longhorn will automatically fail the replicas that don't contain the latest data. + - A new volume is created: This volume will disappear after the restore. Users need to recreate a new volume, launch [a single replica volume](../../data-recovery/export-from-replica) based on the replica of the disappeared volume, then transfer the data to the new volume. + - A volume is deleted: Since the data is cleaned up when the volume is removed, the restored volume contains no data. Users may need to re-delete it. + - For DR volumes: Users don't need to do anything. Longhorn will redo a full restore. + - Some operations are applied for a volume: + - Backup: The backup info of the volume should be resynced automatically. + - Snapshot: The snapshot info of the volume should be resynced once the volume is attached. + - Replica rebuilding & replica removal: + - If there are new replicas rebuilt, those replicas will disappear from the Longhorn system after the restoring. Users need to clean up the replica data manually, or use the data directories of these replicas to export a single replica volume then do data recovery if necessary. + - If there are some failed/removed replicas and there is at least one replica keeping healthy, those failed/removed replicas will be back after the restoration. Then Longhorn can detect these restored replicas do not contain any data, and copy the latest data from the healthy replica to these replicas. + - If all replicas are replaced by new replicas after the snapshot, the volume will contain invalid replicas only after the restore. Then users need to export [a single replica volume](../../data-recovery/export-from-replica) for the data recovery. + - Engine image upgrade: Users need to redo the upgrade. + - Expansion: The spec size of the volume will be smaller than the current size. This is like someone requesting volume shrinking but actually Longhorn will refuse to handle it internally. To recover the volume, users need to scale down the workloads and re-do the expansion. + + - **Notice**: If users don't know how to recover a problematic volume, the simplest way is always restoring a new volume from backup. + +- If the Longhorn system is upgraded after the snapshot, the new settings and the modifications on the node config will disappear. Users need to re-do the upgrade, then re-modify the settings and node configurations. + +- If a node is deleted from Longhorn system after the snapshot, the node won't be back, but the pods on the removed node will be restored. Users need to manually clean up them since these pod may get stuck in state `Terminating`. +- If a node to added to Longhorn system after the snapshot, Longhorn should automatically relaunch all necessary workloads on the node after the cluster restore. But users should be aware that all new replicas or engines on this node will be gone after the restore. + + +## References +- The related GitHub issue is https://github.com/longhorn/longhorn/issues/2228. + In this GitHub post, one user is providing a way that restores the Longhorn to a new cluster that doesn't contain any data. diff --git a/content/docs/1.5.1/advanced-resources/system-backup-restore/restore-to-a-new-cluster-using-velero.md b/content/docs/1.5.1/advanced-resources/system-backup-restore/restore-to-a-new-cluster-using-velero.md new file mode 100644 index 000000000..f67f9641a --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/system-backup-restore/restore-to-a-new-cluster-using-velero.md @@ -0,0 +1,51 @@ +--- +title: Restore to a new cluster using Velero +weight: 4 +--- + +This doc instructs how users can restore workloads with Longhorn system to a new cluster via Velero. + +> **Note:** Need to use [Velero CSI plugin](https://github.com/vmware-tanzu/velero-plugin-for-csi) >= 0.4 to ensure restoring PersistentVolumeClaim successfully. Visit [here](/kb/troubleshooting-restore-pvc-stuck-using-velero-csi-plugin-version-lower-than-0.4) to get more information. + + +## Assumptions: +- A new cluster means there is **no Longhorn volume data** in it. +- There is a remote backup target holds all Longhorn volume data. +- There is a remote backup server that can store the cluster backups created by Velero. + +## Expectation: +- All settings will be restored. But the node & disk configurations won't be applied. +- All workloads using Longhorn volumes will get started after the volumes are restored from the remote backup target. + +## Workflow + +### Create backup for the old cluster +1. Install Velero into a cluster using Longhorn. +2. Create backups for all Longhorn volumes. +3. Use Velero to create a cluster backup. Here, some Longhorn resources should be excluded from the cluster backup: + ```bash + velero backup create lh-cluster --exclude-resources persistentvolumes,persistentvolumeclaims,backuptargets.longhorn.io,backupvolumes.longhorn.io,backups.longhorn.io,nodes.longhorn.io,volumes.longhorn.io,engines.longhorn.io,replicas.longhorn.io,backingimagedatasources.longhorn.io,backingimagemanagers.longhorn.io,backingimages.longhorn.io,sharemanagers.longhorn.io,instancemanagers.longhorn.io,engineimages.longhorn.io + ``` +### Restore Longhorn and workloads to a new cluster +1. Install Velero with the same remote backup sever for the new cluster. +2. Restore the cluster backup. e.g., + ```bash + velero restore create --from-backup lh-cluster + ``` +3. Removing all old instance manager pods and backing image manager pods from namespace `longhorn-system`. These old pods should be created by Longhorn rather than Velero and there should be corresponding CRs for them. The pods are harmless but they would lead to the endless logs printed in longhorn-manager pods. e.g.,: + ```log + [longhorn-manager-q6n7x] time="2021-12-20T10:42:49Z" level=warning msg="Can't find instance manager for pod instance-manager-r-1f19ecb0, may be deleted" + [longhorn-manager-q6n7x] time="2021-12-20T10:42:49Z" level=warning msg="Can't find instance manager for pod instance-manager-e-6c3be222, may be deleted" + [longhorn-manager-ldlvw] time="2021-12-20T10:42:55Z" level=warning msg="Can't find instance manager for pod instance-manager-e-bbf80f76, may be deleted" + [longhorn-manager-ldlvw] time="2021-12-20T10:42:55Z" level=warning msg="Can't find instance manager for pod instance-manager-r-3818fdca, may be deleted" + ``` +4. Re-config nodes and disks for the restored Longhorn system if necessary. +5. Re-create backing images if necessary. +6. Restore all Longhorn volumes from the remote backup target. +7. If there are RWX backup volumes, users need to manually update the access mode to `ReadWriteMany` since all restored volumes are mode `ReadWriteOnce` by default. +8. Create PVCs and PVs with previous names for the restored volumes. + +Note: We will enhance Longhorn system so that users don't need to apply step3 and step8 in the future. + +## References +- The related GitHub issue is https://github.com/longhorn/longhorn/issues/3367 diff --git a/content/docs/1.5.1/advanced-resources/troubleshooting.md b/content/docs/1.5.1/advanced-resources/troubleshooting.md new file mode 100644 index 000000000..06def27aa --- /dev/null +++ b/content/docs/1.5.1/advanced-resources/troubleshooting.md @@ -0,0 +1,68 @@ +--- +title: Troubleshooting +weight: 7 +--- + +> You can generate a support bundle file for offline troubleshooting. See [Support Bundle](../support-bundle) for detail. + +## Common issues +### Volume can be attached/detached from UI, but Kubernetes Pod/StatefulSet etc cannot use it + +#### Using with Flexvolume Plugin +Check if the volume plugin directory has been set correctly. This is automatically detected unless user explicitly set it. + +By default, Kubernetes uses `/usr/libexec/kubernetes/kubelet-plugins/volume/exec/`, as stated in the [official document](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-storage/flexvolume.md/#prerequisites). + +Some vendors choose to change the directory for various reasons. For example, GKE uses `/home/kubernetes/flexvolume` instead. + +The correct directory can be found by running `ps aux|grep kubelet` on the host and check the `--volume-plugin-dir` parameter. If there is none, the default `/usr/libexec/kubernetes/kubelet-plugins/volume/exec/` will be used. + +## Troubleshooting guide + +There are a few components in Longhorn: Manager, Engine, Driver and UI. By default, all of those components run as pods in the `longhorn-system` namespace in the Kubernetes cluster. + +Most of the logs are included in the Support Bundle. You can click the **Generate Support Bundle** link at the bottom of the UI to download a zip file that contains Longhorn-related configuration and logs. + +One exception is the `dmesg`, which needs to be retrieved from each node by the user. + +### UI +Make use of the Longhorn UI is a good start for the troubleshooting. For example, if Kubernetes cannot mount one volume correctly, after stop the workload, try to attach and mount that volume manually on one node and access the content to check if volume is intact. + +Also, the event logs in the UI dashboard provides some information of probably issues. Check for the event logs in `Warning` level. + +### Manager and Engines +You can get the logs from the Longhorn Manager and Engines to help with troubleshooting. The most useful logs are the ones from `longhorn-manager-xxx`, and the logs inside Longhorn instance managers, e.g. `instance-manager-xxxx`, `instance-manager-e-xxxx` and `instance-manager-r-xxxx`. + +Since normally there are multiple Longhorn Managers running at the same time, we recommend using [kubetail,](https://github.com/johanhaleby/kubetail) which is a great tool to keep track of the logs of multiple pods. To track the manager logs in real time, you can use: + +``` +kubetail longhorn-manager -n longhorn-system +``` + + +### CSI driver + +For the CSI driver, check the logs for `csi-attacher-0` and `csi-provisioner-0`, as well as containers in `longhorn-csi-plugin-xxx`. + +### Flexvolume Driver + +The FlexVolume driver is deprecated as of Longhorn v0.8.0 and should no longer be used. + +First check where the driver has been installed on the node. Check the log of `longhorn-driver-deployer-xxxx` for that information. + +Then check the kubelet logs. The FlexVolume driver itself doesn't run inside the container. It would run along with the kubelet process. + +If kubelet is running natively on the node, you can use the following command to get the logs: +``` +journalctl -u kubelet +``` + +Or if kubelet is running as a container (e.g. in RKE), use the following command instead: +``` +docker logs kubelet +``` + +For even more detailed logs of Longhorn FlexVolume, run the following command on the node or inside the container (if kubelet is running as a container, e.g. in RKE): +``` +touch /var/log/longhorn_driver.log +``` diff --git a/content/docs/1.5.1/best-practices.md b/content/docs/1.5.1/best-practices.md new file mode 100644 index 000000000..10b0c013a --- /dev/null +++ b/content/docs/1.5.1/best-practices.md @@ -0,0 +1,130 @@ +--- +title: Best Practices +weight: 5 +--- + +We recommend the following setup for deploying Longhorn in production. + +- [Minimum Recommended Hardware](#minimum-recommended-hardware) +- [Architecture](#architecture) +- [Operating System](#operating-system) +- [Node and Disk Setup](#node-and-disk-setup) + - [Use a Dedicated Disk](#use-a-dedicated-disk) + - [Minimal Available Storage and Over-provisioning](#minimal-available-storage-and-over-provisioning) + - [Disk Space Management](#disk-space-management) + - [Setting up Extra Disks](#setting-up-extra-disks) +- [Configuring Default Disks Before and After Installation](#configuring-default-disks-before-and-after-installation) +- [Deploying Workloads](#deploying-workloads) +- [Volume Maintenance](#volume-maintenance) +- [Guaranteed Instance Manager CPU](#guaranteed-instance-manager-cpu) +- [StorageClass](#storageclass) +- [Scheduling Settings](#scheduling-settings) + - [Replica Node Level Soft Anti-Affinity](#replica-node-level-soft-anti-affinity) + - [Allow Volume Creation with Degraded Availability](#allow-volume-creation-with-degraded-availability) + +## Minimum Recommended Hardware + +- 3 nodes +- 4 vCPUs per node +- 4 GiB per node +- SSD/NVMe or similar performance block device on the node for storage (recommended) +- HDD/Spinning Disk or similar performance block device on the node for storage (verified) + - 500/250 max IOPS per volume (1 MiB I/O) + - 500/250 max throughput per volume (MiB/s) + +## Architecture + +Longhorn supports the following architectures: + +1. AMD64 +1. ARM64 +1. s390x (experimental) + +## Operating System + +> **Note:** CentOS Linux has been removed from the verified OS list below, as it has been discontinued in favor of CentOS Stream [[ref](https://www.redhat.com/en/blog/faq-centos-stream-updates#Q5)], a rolling-release Linux distribution. Our focus for verifying RHEL-based downstream open source distributions will be enterprise-grade, such as Rocky and Oracle Linux. + +The following Linux OS distributions and versions have been verified during the v{{< current-version >}} release testing. However, this does not imply that Longhorn exclusively supports these distributions. Essentially, Longhorn should function well on any certified Kubernetes cluster running on Linux nodes with a wide range of general-purpose operating systems, as well as verified container-optimized operating systems like SLE Micro. + +| No. | OS | Versions +|-----|--------------| -------- +| 1. | Ubuntu | 22.04 +| 2. | SLES | 15 SP4 +| 3. | SLE Micro | 5.3 +| 4. | RHEL | 9.1 +| 5. | Oracle Linux | 9.1 +| 6. | Rocky Linux | 9.2 + +## Node and Disk Setup + +We recommend the following setup for nodes and disks. + +### Use a Dedicated Disk + +It's recommended to dedicate a disk for Longhorn storage for production, instead of using the root disk. + +### Minimal Available Storage and Over-provisioning + +If you need to use the root disk, use the default `minimal available storage percentage` setup which is 25%, and set `overprovisioning percentage` to 200% to minimize the chance of DiskPressure. + +If you're using a dedicated disk for Longhorn, you can lower the setting `minimal available storage percentage` to 10%. + +For the Over-provisioning percentage, it depends on how much space your volume uses on average. For example, if your workload only uses half of the available volume size, you can set the Over-provisioning percentage to `200`, which means Longhorn will consider the disk to have twice the schedulable size as its full size minus the reserved space. + +### Disk Space Management + +Since Longhorn doesn't currently support sharding between the different disks, we recommend using [LVM](https://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)) to aggregate all the disks for Longhorn into a single partition, so it can be easily extended in the future. + +### Setting up Extra Disks + +Any extra disks must be written in the `/etc/fstab` file to allow automatic mounting after the machine reboots. + +Don't use a symbolic link for the extra disks. Use `mount --bind` instead of `ln -s` and make sure it's in the `fstab` file. For details, see [the section about multiple disk support.](../volumes-and-nodes/multidisk/#use-an-alternative-path-for-a-disk-on-the-node) + +## Configuring Default Disks Before and After Installation + +To use a directory other than the default `/var/lib/longhorn` for storage, the `Default Data Path` setting can be changed before installing the system. For details on changing pre-installation settings, refer to [this section.](../advanced-resources/deploy/customizing-default-settings) + +The [Default node/disk configuration](../advanced-resources/default-disk-and-node-config) feature can be used to customize the default disk after installation. Customizing the default configurations for disks and nodes is useful for scaling the cluster because it eliminates the need to configure Longhorn manually for each new node if the node contains more than one disk, or if the disk configuration is different for new nodes. Remember to enable `Create default disk only on labeled node` if applicable. + +## Deploying Workloads + +If you're using `ext4` as the filesystem of the volume, we recommend adding a liveness check to workloads to help automatically recover from a network-caused interruption, a node reboot, or a Docker restart. See [this section](../high-availability/recover-volume/) for details. + +## Volume Maintenance + +We highly recommend using the built-in backup feature of Longhorn. + +For each volume, schedule at least one recurring backup. If you must run Longhorn in production without a backupstore, then schedule at least one recurring snapshot for each volume. + +Longhorn system will create snapshots automatically when rebuilding a replica. Recurring snapshots or backups can also automatically clean up the system-generated snapshot. + +## Guaranteed Instance Manager CPU + +We recommend setting the CPU request for Longhorn instance manager pods. + +The `Guaranteed Instance Manager CPU` setting allows you to reserve a percentage of a node's total allocatable CPU for all instance manager pods. + +You can also set a specific milli CPU value for instance manager pods on a particular node by updating the node's `Instance Manager CPU Request` field. + +> **Note:** This field will overwrite the above setting for the specified node. + +Refer to [Guaranteed Instance Manager CPU](../references/settings/#guaranteed-instance-manager-cpu) for more details. + +## StorageClass + +We don't recommend modifying the default StorageClass named `longhorn`, since the change of parameters might cause issues during an upgrade later. If you want to change the parameters set in the StorageClass, you can create a new StorageClass by referring to the [StorageClass examples](../references/examples/#storageclass). + +## Scheduling Settings + +### Replica Node Level Soft Anti-Affinity + +> Recommend: `false` + +This setting should be set to `false` in production environment to ensure the best availability of the volume. Otherwise, one node down event may bring down more than one replicas of a volume. + +### Allow Volume Creation with Degraded Availability + +> Recommend: `false` + +This setting should be set to `false` in production environment to ensure every volume have the best availability when created. Because with the setting set to `true`, the volume creation won't error out even there is only enough room to schedule one replica. So there is a risk that the cluster is running out of the spaces but the user won't be made aware immediately. diff --git a/content/docs/1.5.1/concepts.md b/content/docs/1.5.1/concepts.md new file mode 100644 index 000000000..d5b456b27 --- /dev/null +++ b/content/docs/1.5.1/concepts.md @@ -0,0 +1,414 @@ +--- +title: Architecture and Concepts +weight: 3 +--- + +Longhorn creates a dedicated storage controller for each volume and synchronously replicates the volume across multiple replicas stored on multiple nodes. + +The storage controller and replicas are themselves orchestrated using Kubernetes. + +For an overview of Longhorn features, refer to [this section.](../what-is-longhorn) + +For the installation requirements, go to [this section.](../deploy/install/#installation-requirements) + +> This section assumes familiarity with Kubernetes persistent storage concepts. For more information on these concepts, refer to the [appendix.](#appendix-how-persistent-storage-works-in-kubernetes) For help with the terminology used in this page, refer to [this section.](../terminology) + +- [1. Design](#1-design) + - [1.1. The Longhorn Manager and the Longhorn Engine](#11-the-longhorn-manager-and-the-longhorn-engine) + - [1.2. Advantages of a Microservices Based Design](#12-advantages-of-a-microservices-based-design) + - [1.3. CSI Driver](#13-csi-driver) + - [1.4. CSI Plugin](#14-csi-plugin) + - [1.5. The Longhorn UI](#15-the-longhorn-ui) +- [2. Longhorn Volumes and Primary Storage](#2-longhorn-volumes-and-primary-storage) + - [2.1. Thin Provisioning and Volume Size](#21-thin-provisioning-and-volume-size) + - [2.2. Reverting Volumes in Maintenance Mode](#22-reverting-volumes-in-maintenance-mode) + - [2.3. Replicas](#23-replicas) + - [2.3.1. How Read and Write Operations Work for Replicas](#231-how-read-and-write-operations-work-for-replicas) + - [2.3.2. How New Replicas are Added](#232-how-new-replicas-are-added) + - [2.3.3. How Faulty Replicas are Rebuilt](#233-how-faulty-replicas-are-rebuilt) + - [2.4. Snapshots](#24-snapshots) + - [2.4.1. How Snapshots Work](#241-how-snapshots-work) + - [2.4.2. Recurring Snapshots](#242-recurring-snapshots) + - [2.4.3. Deleting Snapshots](#243-deleting-snapshots) + - [2.4.4. Storing Snapshots](#244-storing-snapshots) + - [2.4.5. Crash Consistency](#245-crash-consistency) +- [3. Backups and Secondary Storage](#3-backups-and-secondary-storage) + - [3.1. How Backups Work](#31-how-backups-work) + - [3.2. Recurring Backups](#32-recurring-backups) + - [3.3. Disaster Recovery Volumes](#33-disaster-recovery-volumes) + - [3.4. Backupstore Update Intervals, RTO and RPO](#34-backupstore-update-intervals-rto-and-rpo) +- [Appendix: How Persistent Storage Works in Kubernetes](#appendix-how-persistent-storage-works-in-kubernetes) + - [How Kubernetes Workloads use New and Existing Persistent Storage](#how-kubernetes-workloads-use-new-and-existing-persistent-storage) + - [Existing Storage Provisioning](#existing-storage-provisioning) + - [Dynamic Storage Provisioning](#dynamic-storage-provisioning) + - [Horizontal Scaling for Kubernetes Workloads with Persistent Storage](#horizontal-scaling-for-kubernetes-workloads-with-persistent-storage) + +# 1. Design + +The Longhorn design has two layers: the data plane and the controlplane. The Longhorn Engine is a storage controller that corresponds to the data plane, and the Longhorn Manager corresponds to the controlplane. + +## 1.1. The Longhorn Manager and the Longhorn Engine + +The Longhorn Manager Pod runs on each node in the Longhorn cluster as a Kubernetes [DaemonSet.](https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/) It is responsible for creating and managing volumes in the Kubernetes cluster, and handles the API calls from the UI or the volume plugins for Kubernetes. It follows the Kubernetes controller pattern, which is sometimes called the operator pattern. + +The Longhorn Manager communicates with the Kubernetes API server to create a new Longhorn volume [CRD.](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) Then the Longhorn Manager watches the API server's response, and when it sees that the Kubernetes API server created a new Longhorn volume CRD, the Longhorn Manager creates a new volume. + +When the Longhorn Manager is asked to create a volume, it creates a Longhorn Engine instance on the node the volume is attached to, and it creates a replica on each node where a replica will be placed. Replicas should be placed on separate hosts to ensure maximum availability. + +The multiple data paths of the replicas ensure high availability of the Longhorn volume. Even if a problem happens with a certain replica or with the Engine, the problem won't affect all the replicas or the Pod's access to the volume. The Pod will still function normally. + +The Longhorn Engine always runs in the same node as the Pod that uses the Longhorn volume. It synchronously replicates the volume across the multiple replicas stored on multiple nodes. + +The Engine and replicas are orchestrated using Kubernetes. + +In the figure below, + +- There are three instances with Longhorn volumes. +- Each volume has a dedicated controller, which is called the Longhorn Engine and runs as a Linux process. +- Each Longhorn volume has two replicas, and each replica is a Linux process. +- The arrows in the figure indicate the read/write data flow between the volume, controller instance, replica instances, and disks. +- By creating a separate Longhorn Engine for each volume, if one controller fails, the function of other volumes is not impacted. + +**Figure 1. Read/write Data Flow between the Volume, Longhorn Engine, Replica Instances, and Disks** + +{{< figure alt="read/write data flow between the volume, controller instance, replica instances, and disks" src="/img/diagrams/architecture/how-longhorn-works.svg" >}} + +## 1.2. Advantages of a Microservices Based Design + +In Longhorn, each Engine only needs to serve one volume, simplifying the design of the storage controllers. Because the failure domain of the controller software is isolated to individual volumes, a controller crash will only impact one volume. + +The Longhorn Engine is simple and lightweight enough so that we can create as many as 100,000 separate engines. Kubernetes schedules these separate engines, drawing resources from a shared set of disks and working with Longhorn to form a resilient distributed block storage system. + +Because each volume has its own controller, the controller and replica instances for each volume can also be upgraded without causing a noticeable disruption in IO operations. + +Longhorn can create a long-running job to orchestrate the upgrade of all live volumes without disrupting the on-going operation of the system. To ensure that an upgrade does not cause unforeseen issues, Longhorn can choose to upgrade a small subset of the volumes and roll back to the old version if something goes wrong during the upgrade. + +## 1.3. CSI Driver + +The Longhorn CSI driver takes the block device, formats it, and mounts it on the node. Then the [kubelet](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/) bind-mounts the device inside a Kubernetes Pod. This allows the Pod to access the Longhorn volume. + +The required Kubernetes CSI Driver images will be deployed automatically by the longhorn driver deployer. +To install Longhorn in an air gapped environment, refer to [this section](../advanced-resources/deploy/airgap). + +## 1.4. CSI Plugin + +Longhorn is managed in Kubernetes via a [CSI Plugin.](https://kubernetes-csi.github.io/docs/) This allows for easy installation of the Longhorn plugin. + +The Kubernetes CSI plugin calls Longhorn to create volumes to create persistent data for a Kubernetes workload. The CSI plugin gives you the ability to create, delete, attach, detach, mount the volume, and take snapshots of the volume. All other functionality provided by Longhorn is implemented through the Longhorn UI. + +The Kubernetes cluster internally uses the CSI interface to communicate with the Longhorn CSI plugin. And the Longhorn CSI plugin communicates with the Longhorn Manager using the Longhorn API. + +Longhorn does leverage iSCSI, so extra configuration of the node may be required. This may include the installation of open-iscsi or iscsiadm depending on the distribution. + +## 1.5. The Longhorn UI + +The Longhorn UI interacts with the Longhorn Manager through the Longhorn API, and acts as a complement of Kubernetes. Through the Longhorn UI, you can manage snapshots, backups, nodes and disks. + +Besides, the space usage of the cluster worker nodes is collected and illustrated by the Longhorn UI. See [here](../volumes-and-nodes/node-space-usage) for details. + +# 2. Longhorn Volumes and Primary Storage + +When creating a volume, the Longhorn Manager creates the Longhorn Engine microservice and the replicas for each volume as microservices. Together, these microservices form a Longhorn volume. Each replica should be placed on a different node or on different disks. + +After the Longhorn Engine is created by the Longhorn Manager, it connects to the replicas. The Engine exposes a block device on the same node where the Pod is running. + +A Longhorn volume can be created with kubectl. + +### 2.1. Thin Provisioning and Volume Size + +Longhorn is a thin-provisioned storage system. That means a Longhorn volume will only take the space it needs at the moment. For example, if you allocated a 20 GB volume but only use 1GB of it, the actual data size on your disk would be 1 GB. You can see the actual data size in the volume details in the UI. + +A Longhorn volume itself cannot shrink in size if you’ve removed content from your volume. For example, if you create a volume of 20 GB, used 10 GB, then removed the content of 9 GB, the actual size on the disk would still be 10 GB instead of 1 GB. This happens because Longhorn operates on the block level, not the filesystem level, so Longhorn doesn’t know if the content has been removed by a user or not. That information is mostly kept at the filesystem level. + +For more introductions about the volume-size related concepts, see this [doc](../volumes-and-nodes/volume-size) for more details. + +### 2.2. Reverting Volumes in Maintenance Mode + +When a volume is attached from the Longhorn UI, there is a checkbox for Maintenance mode. It’s mainly used to revert a volume from a snapshot. + +The option will result in attaching the volume without enabling the frontend (block device or iSCSI), to make sure no one can access the volume data when the volume is attached. + +After v0.6.0, the snapshot reverting operation required the volume to be in maintenance mode. This is because if the block device’s content is modified while the volume is mounted or being used, it will cause filesystem corruption. + +It’s also useful to inspect the volume state without worrying about the data being accessed by accident. + +## 2.3. Replicas + +Each replica contains a chain of snapshots of a Longhorn volume. A snapshot is like a layer of an image, with the oldest snapshot used as the base layer, and newer snapshots on top. Data is only included in a new snapshot if it overwrites data in an older snapshot. Together, a chain of snapshots shows the current state of the data. + +For each Longhorn volume, multiple replicas of the volume should run in the Kubernetes cluster, each on a separate node. All replicas are treated the same, and the Longhorn Engine always runs on the same node as the pod, which is also the consumer of the volume. In that way, we make sure that even if the Pod is down, the Engine can be moved to another Pod and your service will continue undisrupted. + +The default replica count can be changed in the [settings.](../references/settings/#default-replica-count) When a volume is attached, the replica count for the volume can be changed in the UI. + +If the current healthy replica count is less than specified replica count, Longhorn will start rebuilding new replicas. + +If the current healthy replica count is more than the specified replica count, Longhorn will do nothing. In this situation, if a replica fails or is deleted, Longhorn won’t start rebuilding new replicas unless the healthy replica count dips below the specified replica count. + +Longhorn replicas are built using Linux [sparse files,](https://en.wikipedia.org/wiki/Sparse_file) which support thin provisioning. + +### 2.3.1. How Read and Write Operations Work for Replicas + +When data is read from a replica of a volume, if the data can be found in the live data, then that data is used. If not, the newest snapshot will be read. If the data is not found in the newest snapshot, the next-oldest snapshot is read, and so on, until the oldest snapshot is read. + +When you take a snapshot, a [differencing](https://en.wikipedia.org/wiki/Data_differencing) disk is created. As the number of snapshots grows, the differencing disk chain (also called a chain of snapshots) could get quite long. To improve read performance, Longhorn therefore maintains a read index that records which differencing disk holds valid data for each 4K block of storage. + +In the following figure, the volume has eight blocks. The read index has eight entries and is filled up lazily as read operations take place. + +A write operation resets the read index, causing it to point to the live data. The live data consists of data at some indices and empty space in other indices. + +Beyond the read index, we currently do not maintain additional metadata to indicate which blocks are used. + +**Figure 2. How the Read Index Keeps Track of Which Snapshot Holds the Most Recent Data** + +{{< figure alt="how the read index keeps track of which snapshot holds the most recent data" src="/img/diagrams/architecture/read-index.png" >}} + +The figure above is color-coded to show which blocks contain the most recent data according to the read index, and the source of the latest data is also listed in the table below: + +| Read Index | Source of the latest data | +|---------------|--------------------------------| +| 0 | Newest snapshot | +| 1 | Live data | +| 2 | Oldest snapshot | +| 3 | Oldest snapshot | +| 4 | Oldest snapshot | +| 5 | Live data | +| 6 | Live data | +| 7 | Live data | + +Note that as the green arrow shows in the figure above, Index 5 of the read index previously pointed to the second-oldest snapshot as the source of the most recent data, then it changed to point to the the live data when the 4K block of storage at Index 5 was overwritten by the live data. + +The read index is kept in memory and consumes one byte for each 4K block. The byte-sized read index means you can take as many as 254 snapshots for each volume. + +The read index consumes a certain amount of in-memory data structure for each replica. A 1 TB volume, for example, consumes 256 MB of in-memory read index. + +### 2.3.2 How New Replicas are Added + +When a new replica is added, the existing replicas are synced to the new replica. The first replica is created by taking a new snapshot from the live data. + +The following steps show a more detailed breakdown of how Longhorn adds new replicas: + +1. The Longhorn Engine is paused. +1. Let's say that the chain of snapshots within the replica consists of the live data and a snapshot. When the new replica is created, the live data becomes the newest (second) snapshot and a new, blank version of live data is created. +1. The new replica is created in WO (write-only) mode. +1. The Longhorn Engine is unpaused. +1. All the snapshots are synced. +1. The new replica is set to RW (read-write) mode. + +### 2.3.3. How Faulty Replicas are Rebuilt + +Longhorn will always try to maintain at least given number of healthy replicas for each volume. + +When the controller detects failures in one of its replicas, it marks the replica as being in an error state. The Longhorn Manager is responsible for initiating and coordinating the process of rebuilding the faulty replica. + +To rebuild the faulty replica, the Longhorn Manager creates a blank replica and calls the Longhorn Engine to add the blank replica into the volume's replica set. + +To add the blank replica, the Engine performs the following operations: + 1. Pauses all read and write operations. + 2. Adds the blank replica in WO (write-only) mode. + 3. Takes a snapshot of all existing replicas, which will now have a blank differencing disk at its head. + 4. Unpauses all read and write operations. Only write operations will be dispatched to the newly added replica. + 5. Starts a background process to sync all but the most recent differencing disk from a good replica to the blank replica. + 6. After the sync completes, all replicas now have consistent data, and the volume manager sets the new replica to RW (read-write) mode. + +Finally, the Longhorn Manager calls the Longhorn Engine to remove the faulty replica from its replica set. + +## 2.4. Snapshots + +The snapshot feature enables a volume to be reverted back to a certain point in history. Backups in secondary storage can also be built from a snapshot. + +When a volume is restored from a snapshot, it reflects the state of the volume at the time the snapshot was created. + +The snapshot feature is also a part of Longhorn's rebuilding process. Every time Longhorn detects a replica is down, it will automatically take a (system) snapshot and start rebuilding it on another node. + +### 2.4.1. How Snapshots Work + +A snapshot is like a layer of an image, with the oldest snapshot used as the base layer, and newer snapshots on top. Data is only included in a new snapshot if it overwrites data in an older snapshot. Together, a chain of snapshots shows the current state of the data. For a more detailed breakdown of how data is read from a replica, refer to the section on [read and write operations for replicas.](#231-how-read-and-write-operations-work-for-replicas) + +Snapshots cannot change after they are created, unless a snapshot is deleted, in which case its changes are conflated with the next most recent snapshot. New data is always written to the live version. New snapshots are always created from live data. + +To create a new snapshot, the live data becomes the newest snapshot. Then a new, blank version of the live data is created, taking the place of the old live data. + +### 2.4.2. Recurring Snapshots + +To reduce the space taken by snapshots, user can schedule a recurring snapshot or backup with a number of snapshots to retain, which will automatically create a new snapshot/backup on schedule, then clean up for any excessive snapshots/backups. + +### 2.4.3. Deleting Snapshots + +Unwanted snapshots can be manually deleted through the UI. Any system generated snapshots will be automatically marked for deletion if the deletion of any snapshot was triggered. + +In Longhorn, the latest snapshot cannot be deleted. This is because whenever a snapshot is deleted, Longhorn will conflate its content with the next snapshot, so that the next and later snapshot retains the correct content. + +But Longhorn cannot do that for the latest snapshot since there is no more recent snapshot to be conflated with the deleted snapshot. The next “snapshot” of the latest snapshot is the live volume (volume-head), which is being read/written by the user at the moment, so the conflation process cannot happen. + +Instead, the latest snapshot will be marked as removed, and it will be cleaned up next time when possible. + +To clean up the latest snapshot, a new snapshot can be created, then the previous "latest" snapshot can be removed. + +### 2.4.4. Storing Snapshots + +Snapshots are stored locally, as a part of each replica of a volume. They are stored on the disk of the nodes within the Kubernetes cluster. +Snapshots are stored in the same location as the volume data on the host’s physical disk. + +### 2.4.5. Crash Consistency + +Longhorn is a crash-consistent block storage solution. + +It’s normal for the OS to keep content in the cache before writing into the block layer. This means that if all of the replicas are down, then Longhorn may not contain the changes that occurred immediately before the shutdown, because the content was kept in the OS-level cache and wasn't yet transferred to the Longhorn system. + +This problem is similar to problems that could happen if your desktop computer shuts down due to a power outage. After resuming the power, you may find some corrupted files in the hard drive. + +To force the data to be written to the block layer at any given moment, the sync command can be manually run on the node, or the disk can be unmounted. The OS would write the content from the cache to the block layer in either situation. + +Longhorn runs the sync command automatically before creating a snapshot. + +# 3. Backups and Secondary Storage + +A backup is an object in the backupstore, which is an NFS or S3 compatible object store external to the Kubernetes cluster. Backups provide a form of secondary storage so that even if your Kubernetes cluster becomes unavailable, your data can still be retrieved. + +Because the volume replication is synchronized, and because of network latency, it is hard to do cross-region replication. The backupstore is also used as a medium to address this problem. + +When the backup target is configured in the Longhorn settings, Longhorn can connect to the backupstore and show you a list of existing backups in the Longhorn UI. + +If Longhorn runs in a second Kubernetes cluster, it can also sync disaster recovery volumes to the backups in secondary storage, so that your data can be recovered more quickly in the second Kubernetes cluster. + +## 3.1. How Backups Work + +A backup is created using one snapshot as a source, so that it reflects the state of the volume's data at the time that the snapshot was created. A backup is stored remotely outside of the cluster. + +By contrast to a snapshot, a backup can be thought of as a flattened version of a chain of snapshots. Similar to the way that information is lost when a layered image is converted to a flat image, data is also lost when a chain of snapshots is converted to a backup. In both conversions, any overwritten data would be lost. + +Because backups don't contain snapshots, they don't contain the history of changes to the volume data. After you restore a volume from a backup, the volume initially contains one snapshot. This snapshot is a conflated version of all the snapshots in the original chain, and it reflects the live data of the volume at the time at the time the backup was created. + +While snapshots can be hundreds of gigabytes, backups are made of 2 MB files. + +Each new backup of the same original volume is incremental, detecting and transmitting the changed blocks between snapshots. This is a relatively easy task because each snapshot is a [differencing](https://en.wikipedia.org/wiki/Data_differencing) file and only stores the changes from the last snapshot. This design also means that if no blocks have changed and a backup is taken, that backup in the backupstore will show as 0 bytes. However if you were to restore from that backup it would still contain the full volume data, since it would restore the necessary blocks already present on the backupstore, that are required for a backup. + +To avoid storing a very large number of small blocks of storage, Longhorn performs backup operations using 2 MB blocks. That means that, if any 4K block in a 2MB boundary is changed, Longhorn will back up the entire 2MB block. This offers the right balance between manageability and efficiency. + +**Figure 3. The Relationship between Backups in Secondary Storage and Snapshots in Primary Storage** + +{{< figure alt="the relationship between backups in secondary storage and snapshots in primary storage" src="/img/diagrams/concepts/longhorn-backup-creation.png" >}} + +The above figure describes how backups are created from snapshots in Longhorn: + +- The Primary Storage side of the diagram shows one replica of a Longhorn volume in the Kubernetes cluster. The replica consists of a chain of four snapshots. In order from newest to oldest, the snapshots are Live Data, snap3, snap2, and snap1. +- The Secondary Storage side of the diagram shows two backups in an external object storage service such as S3. +- In Secondary Storage, the color coding for backup-from-snap2 shows that it includes both the blue change from snap1 and the green changes from snap2. No changes from snap2 overwrote the data in snap1, therefore the changes from both snap1 and snap2 are both included in backup-from-snap2. +- The backup named backup-from-snap3 reflects the state of the volume's data at the time that snap3 was created. The color coding and arrows indicate that backup-from-snap3 contains all of the dark red changes from snap3, but only one of the green changes from snap2. This is because one of the red changes in snap3 overwrote one of the green changes in snap2. This illustrates how backups don't include the full history of change, because they conflate snapshots with the snapshots that came before them. +- Each backup maintains its own set of 2 MB blocks. Each 2 MB block is backed up only once. The two backups share one green block and one blue block. + +When a backup is deleted from the secondary storage, Longhorn does not delete all the blocks that it uses. Instead, it performs a garbage collection periodically to clean up unused blocks from secondary storage. + +The 2 MB blocks for all backups belonging to the same volume are stored under a common directory and can therefore be shared across multiple backups. + +To save space, the 2 MB blocks that didn't change between backups can be reused for multiple backups that share the same backup volume in secondary storage. Because checksums are used to address the 2 MB blocks, we achieve some degree of deduplication for the 2 MB blocks in the same volume. + +Volume-level metadata is stored in volume.cfg. The metadata files for each backup (e.g., snap2.cfg) are relatively small because they only contain the [offsets](https://en.wikipedia.org/wiki/Offset_(computer_science)) and [checksums](https://en.wikipedia.org/wiki/Checksum) of all the 2 MB blocks in the backup. + +Each 2 MB block (.blk file) is compressed. + +## 3.2. Recurring Backups + +Backup operations can be scheduled using the recurring snapshot and backup feature, but they can also be done as needed. + +It’s recommended to schedule recurring backups for your volumes. If a backupstore is not available, it’s recommended to have the recurring snapshot scheduled instead. + +Backup creation involves copying the data through the network, so it will take time. + +## 3.3. Disaster Recovery Volumes + +A disaster recovery (DR) volume is a special volume that stores data in a backup cluster in case the whole main cluster goes down. DR volumes are used to increase the resiliency of Longhorn volumes. + +Because the main purpose of a DR volume is to restore data from backup, this type of volume doesn’t support the following actions before it is activated: + +- Creating, deleting, and reverting snapshots +- Creating backups +- Creating persistent volumes +- Creating persistent volume claims + +A DR volume can be created from a volume’s backup in the backup store. After the DR volume is created, Longhorn will monitor its original backup volume and incrementally restore from the latest backup. A backup volume is an object in the backupstore that contains multiple backups of the same volume. + +If the original volume in the main cluster goes down, the DR volume can be immediately activated in the backup cluster, so it can greatly reduce the time needed to restore the data from the backup store to the volume in the backup cluster. + +When a DR volume is activated, Longhorn will check the last backup of the original volume. If that backup has not already been restored, the restoration will be started, and the activate action will fail. Users need to wait for the restoration to complete before retrying. + +The Backup Target in the Longhorn settings cannot be updated if any DR volumes exist. + +After a DR volume is activated, it becomes a normal Longhorn volume and it cannot be deactivated. + +## 3.4. Backupstore Update Intervals, RTO, and RPO + +Typically incremental restoration is triggered by the periodic backup store update. Users can set backup store update interval in Setting - General - Backupstore Poll Interval. + +Notice that this interval can potentially impact Recovery Time Objective (RTO). If it is too long, there may be a large amount of data for the disaster recovery volume to restore, which will take a long time. + +As for Recovery Point Objective (RPO), it is determined by recurring backup scheduling of the backup volume. If recurring backup scheduling for normal volume A creates a backup every hour, then the RPO is one hour. You can check here to see how to set recurring backups in Longhorn. + +The following analysis assumes that the volume creates a backup every hour, and that incrementally restoring data from one backup takes five minutes: + +- If the Backupstore Poll Interval is 30 minutes, then there will be at most one backup worth of data since the last restoration. The time for restoring one backup is five minutes, so the RTO would be five minutes. +- If the Backupstore Poll Interval is 12 hours, then there will be at most 12 backups worth of data since last restoration. The time for restoring the backups is 5 * 12 = 60 minutes, so the RTO would be 60 minutes. + +# Appendix: How Persistent Storage Works in Kubernetes + +To understand persistent storage in Kubernetes, it is important to understand Volumes, PersistentVolumes, PersistentVolumeClaims, and StorageClasses, and how they work together. + +One important property of a Kubernetes Volume is that it has the same lifecycle as the Pod it belongs to. The Volume is lost if the Pod is gone. In contrast, a PersistentVolume continues to exist in the system until users delete it. Volumes can also be used to share data between containers inside the same Pod, but this isn’t the primary use case because users normally only have one container per Pod. + +A [PersistentVolume (PV)](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) is a piece of persistent storage in the Kubernetes cluster, while a [PersistentVolumeClaim (PVC)](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims) is a request for storage. [StorageClasses](https://kubernetes.io/docs/concepts/storage/storage-classes/) allow new storage to be dynamically provisioned for workloads on demand. + +## How Kubernetes Workloads use New and Existing Persistent Storage + +Broadly speaking, there are two main ways to use persistent storage in Kubernetes: + +- Use an existing persistent volume +- Dynamically provision new persistent volumes + +### Existing Storage Provisioning + +To use an existing PV, your application will need to use a PVC that is bound to a PV, and the PV should include the minimum resources that the PVC requires. + +In other words, a typical workflow for setting up existing storage in Kubernetes is as follows: + +1. Set up persistent storage volumes, in the sense of physical or virtual storage that you have access to. +1. Add a PV that refers to the persistent storage. +1. Add a PVC that refers to the PV. +1. Mount the PVC as a volume in your workload. + +When a PVC requests a piece of storage, the Kubernetes API server will try to match that PVC with a pre-allocated PV as matching volumes become available. If a match can be found, the PVC will be bound to the PV, and the user will start to use that pre-allocated piece of storage. + +if a matching volume does not exist, PersistentVolumeClaims will remain unbound indefinitely. For example, a cluster provisioned with many 50 Gi PVs would not match a PVC requesting 100 Gi. The PVC could be bound after a 100 Gi PV is added to the cluster. + +In other words, you can create unlimited PVCs, but they will only be bound to PVs if the Kubernetes master can find a sufficient PV that has at least the amount of disk space required by the PVC. + +### Dynamic Storage Provisioning + +For dynamic storage provisioning, your application will need to use a PVC that is bound to a StorageClass. The StorageClass contains the authorization to provision new persistent volumes. + +The overall workflow for dynamically provisioning new storage in Kubernetes involves a StorageClass resource: + +1. Add a StorageClass and configure it to automatically provision new storage from the storage that you have access to. +1. Add a PVC that refers to the StorageClass. +1. Mount the PVC as a volume for your workload. + +Kubernetes cluster administrators can use a Kubernetes StorageClass to describe the “classes” of storage they offer. StorageClasses can have different capacity limits, different IOPS, or any other parameters that the provisioner supports. The storage vendor specific provisioner is be used along with the StorageClass to allocate PV automatically, following the parameters set in the StorageClass object. Also, the provisioner now has the ability to enforce the resource quotas and permission requirements for users. In this design, admins are freed from the unnecessary work of predicting the need for PVs and allocating them. + +When a StorageClass is used, a Kubernetes administrator is not responsible for allocating every piece of storage. The administrator just needs to give users permission to access a certain storage pool, and decide the quota for the user. Then the user can carve out the needed pieces of the storage from the storage pool. + +StorageClasses can also be used without explicitly creating a StorageClass object in Kubernetes. Since the StorageClass is also a field used to match a PVC with a PV, a PV can be created manually with a custom Storage Class name, then a PVC can be created that asks for a PV with that StorageClass name. Kubernetes can then bind your PVC to the PV with the specified StorageClass name, even if the StorageClass object doesn't exist as a Kubernetes resource. + +Longhorn introduces a Longhorn StorageClass so that your Kubernetes workloads can carve out pieces of your persistent storage as necessary. + +## Horizontal Scaling for Kubernetes Workloads with Persistent Storage + +The VolumeClaimTemplate is a StatefulSet spec property, and it provides a way for the block storage solution to scale horizontally for a Kubernetes workload. + +This property can be used to create matching PVs and PVCs for Pods that were created by a StatefulSet. + +Those PVCs are created using a StorageClass, so they can be set up automatically when the StatefulSet scales up. + +When a StatefulSet scales down, the extra PVs/PVCs are kept in the cluster, and they are reused when the StatefulSet scales up again. + +The VolumeClaimTemplate is important for block storage solutions like EBS and Longhorn. Because those solutions are inherently [ReadWriteOnce,](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes) they cannot be shared between the Pods. + +Deployments don't work well with persistent storage if you have more than one Pod running with persistent data. For more than one pod, a StatefulSet should be used. diff --git a/content/docs/1.5.1/contributing.md b/content/docs/1.5.1/contributing.md new file mode 100644 index 000000000..d7bb10f47 --- /dev/null +++ b/content/docs/1.5.1/contributing.md @@ -0,0 +1,33 @@ +--- +title: Contributing +weight: 6 +--- + +Longhorn is open source software, so contributions are greatly welcome. Please read the [Cloud Native Computing Foundation Code of Conduct](https://github.com/cncf/foundation/blob/master/code-of-conduct.md) and [Contributing Guidelines](https://github.com/longhorn/longhorn/blob/master/CONTRIBUTING.md) before contributing. + +Contributing code is not the only way of contributing. We value feedback very much and many of the Longhorn features are originated from users' feedback. If you have any feedback, feel free to [file an issue](https://github.com/longhorn/longhorn/issues/new/choose) and talk to the developers at the [CNCF](https://slack.cncf.io/) [#longhorn](https://cloud-native.slack.com/messages/longhorn) slack channel. + +Longhorn is a [CNCF Incubating Project.](https://www.cncf.io/projects/longhorn/) + +![Longhorn is a CNCF Incubating Project](https://raw.githubusercontent.com/cncf/artwork/master/other/cncf/horizontal/color/cncf-color.svg) + +## Source Code + +Longhorn is 100% open source software under the auspices of the [Cloud Native Computing Foundation](https://cncf.io). The project's source code is spread across a number of repos: + +| Component | What it does | GitHub repo | +| :----------------------------- | :--------------------------------------------------------------------- | :------------------------------------------------------------------------------------------ | +| Longhorn Backing Image Manager | Backing image download, sync, and deletion in a disk | [longhorn/backing-image-manager](https://github.com/longhorn/backing-image-manager) | +| Longhorn Engine | Core controller/replica logic | [longhorn/longhorn-engine](https://github.com/longhorn/longhorn-engine) | +| Longhorn Instance Manager | Controller/replica instance lifecycle management | [longhorn/longhorn-instance-manager](https://github.com/longhorn/longhorn-instance-manager) | +| Longhorn Manager | Longhorn orchestration, includes CSI driver for Kubernetes | [longhorn/longhorn-manager](https://github.com/longhorn/longhorn-manager) | +| Longhorn Share Manager | NFS provisioner that exposes Longhorn volumes as ReadWriteMany volumes | [longhorn/longhorn-share-manager](https://github.com/longhorn/longhorn-share-manager) | +| Longhorn UI | The Longhorn dashboard | [longhorn/longhorn-ui](https://github.com/longhorn/longhorn-ui) | + +## License + +Copyright (c) 2014-2021 The Longhorn Authors. + +Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at [Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0). + +Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. diff --git a/content/docs/1.5.1/deploy/_index.md b/content/docs/1.5.1/deploy/_index.md new file mode 100644 index 000000000..755d685d7 --- /dev/null +++ b/content/docs/1.5.1/deploy/_index.md @@ -0,0 +1,4 @@ +--- +title: Deploy +weight: 1 +--- \ No newline at end of file diff --git a/content/docs/1.5.1/deploy/accessing-the-ui/_index.md b/content/docs/1.5.1/deploy/accessing-the-ui/_index.md new file mode 100644 index 000000000..d5c2306e5 --- /dev/null +++ b/content/docs/1.5.1/deploy/accessing-the-ui/_index.md @@ -0,0 +1,41 @@ +--- +title: Accessing the UI +weight: 2 +--- + +## Prerequisites for Access and Authentication + +These instructions assume that Longhorn is installed. + +If you installed Longhorn YAML manifest, you'll need to set up an Ingress controller to allow external traffic into the cluster, and authentication will not be enabled by default. This applies to Helm and kubectl installations. For information on creating an NGINX Ingress controller with basic authentication, refer to [this section.](./longhorn-ingress) + +If Longhorn was installed as a Rancher catalog app, Rancher automatically created an Ingress controller for you with access control (the rancher-proxy). + +## Accessing the Longhorn UI + +Once Longhorn has been installed in your Kubernetes cluster, you can access the UI dashboard. + +1. Get the Longhorn's external service IP: + + ```shell + kubectl -n longhorn-system get svc + ``` + + For Longhorn v0.8.0, the output should look like this, and the `CLUSTER-IP` of the `longhorn-frontend` is used to access the Longhorn UI: + + ```shell + NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE + longhorn-backend ClusterIP 10.20.248.250 9500/TCP 58m + longhorn-frontend ClusterIP 10.20.245.110 80/TCP 58m + + ``` + + In the example above, the IP is `10.20.245.110`. + + > For Longhorn v0.8.0+, UI service type changed from `LoadBalancer` to `ClusterIP.` + +2. Navigate to the IP of `longhorn-frontend` in your browser. + + The Longhorn UI looks like this: + + {{< figure src="/img/screenshots/getting-started/longhorn-ui.png" >}} diff --git a/content/docs/1.5.1/deploy/accessing-the-ui/longhorn-ingress.md b/content/docs/1.5.1/deploy/accessing-the-ui/longhorn-ingress.md new file mode 100644 index 000000000..a14390342 --- /dev/null +++ b/content/docs/1.5.1/deploy/accessing-the-ui/longhorn-ingress.md @@ -0,0 +1,170 @@ +--- + title: Create an Ingress with Basic Authentication (nginx) + weight: 1 +--- + +If you install Longhorn on a Kubernetes cluster with kubectl or Helm, you will need to create an Ingress to allow external traffic to reach the Longhorn UI. + +Authentication is not enabled by default for kubectl and Helm installations. In these steps, you'll learn how to create an Ingress with basic authentication using annotations for the nginx ingress controller. + +1. Create a basic auth file `auth`. It's important the file generated is named auth (actually - that the secret has a key `data.auth`), otherwise the Ingress returns a 503. + ``` + $ USER=; PASSWORD=; echo "${USER}:$(openssl passwd -stdin -apr1 <<< ${PASSWORD})" >> auth + ``` +2. Create a secret: + ``` + $ kubectl -n longhorn-system create secret generic basic-auth --from-file=auth + ``` +3. Create an Ingress manifest `longhorn-ingress.yml` : + > Since v1.2.0, Longhorn supports uploading backing image from the UI, so please specify `nginx.ingress.kubernetes.io/proxy-body-size: 10000m` as below to ensure uploading images work as expected. + + ``` + apiVersion: networking.k8s.io/v1 + kind: Ingress + metadata: + name: longhorn-ingress + namespace: longhorn-system + annotations: + # type of authentication + nginx.ingress.kubernetes.io/auth-type: basic + # prevent the controller from redirecting (308) to HTTPS + nginx.ingress.kubernetes.io/ssl-redirect: 'false' + # name of the secret that contains the user/password definitions + nginx.ingress.kubernetes.io/auth-secret: basic-auth + # message to display with an appropriate context why the authentication is required + nginx.ingress.kubernetes.io/auth-realm: 'Authentication Required ' + # custom max body size for file uploading like backing image uploading + nginx.ingress.kubernetes.io/proxy-body-size: 10000m + spec: + rules: + - http: + paths: + - pathType: Prefix + path: "/" + backend: + service: + name: longhorn-frontend + port: + number: 80 + ``` +4. Create the Ingress: + ``` + $ kubectl -n longhorn-system apply -f longhorn-ingress.yml + ``` + +e.g.: +``` +$ USER=foo; PASSWORD=bar; echo "${USER}:$(openssl passwd -stdin -apr1 <<< ${PASSWORD})" >> auth +$ cat auth +foo:$apr1$FnyKCYKb$6IP2C45fZxMcoLwkOwf7k0 + +$ kubectl -n longhorn-system create secret generic basic-auth --from-file=auth +secret/basic-auth created +$ kubectl -n longhorn-system get secret basic-auth -o yaml +apiVersion: v1 +data: + auth: Zm9vOiRhcHIxJEZueUtDWUtiJDZJUDJDNDVmWnhNY29Md2tPd2Y3azAK +kind: Secret +metadata: + creationTimestamp: "2020-05-29T10:10:16Z" + name: basic-auth + namespace: longhorn-system + resourceVersion: "2168509" + selfLink: /api/v1/namespaces/longhorn-system/secrets/basic-auth + uid: 9f66233f-b12f-4204-9c9d-5bcaca794bb7 +type: Opaque + +$ echo " +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: longhorn-ingress + namespace: longhorn-system + annotations: + # type of authentication + nginx.ingress.kubernetes.io/auth-type: basic + # prevent the controller from redirecting (308) to HTTPS + nginx.ingress.kubernetes.io/ssl-redirect: 'false' + # name of the secret that contains the user/password definitions + nginx.ingress.kubernetes.io/auth-secret: basic-auth + # message to display with an appropriate context why the authentication is required + nginx.ingress.kubernetes.io/auth-realm: 'Authentication Required ' +spec: + rules: + - http: + paths: + - pathType: Prefix + path: "/" + backend: + service: + name: longhorn-frontend + port: + number: 80 +" | kubectl -n longhorn-system create -f - +ingress.networking.k8s.io/longhorn-ingress created + +$ kubectl -n longhorn-system get ingress +NAME HOSTS ADDRESS PORTS AGE +longhorn-ingress * 45.79.165.114,66.228.45.37,97.107.142.125 80 2m7s + +$ curl -v http://97.107.142.125/ +* Trying 97.107.142.125... +* TCP_NODELAY set +* Connected to 97.107.142.125 (97.107.142.125) port 80 (#0) +> GET / HTTP/1.1 +> Host: 97.107.142.125 +> User-Agent: curl/7.64.1 +> Accept: */* +> +< HTTP/1.1 401 Unauthorized +< Server: openresty/1.15.8.1 +< Date: Fri, 29 May 2020 11:47:33 GMT +< Content-Type: text/html +< Content-Length: 185 +< Connection: keep-alive +< WWW-Authenticate: Basic realm="Authentication Required" +< + +401 Authorization Required + +

401 Authorization Required

+
openresty/1.15.8.1
+ + +* Connection #0 to host 97.107.142.125 left intact +* Closing connection 0 + +$ curl -v http://97.107.142.125/ -u foo:bar +* Trying 97.107.142.125... +* TCP_NODELAY set +* Connected to 97.107.142.125 (97.107.142.125) port 80 (#0) +* Server auth using Basic with user 'foo' +> GET / HTTP/1.1 +> Host: 97.107.142.125 +> Authorization: Basic Zm9vOmJhcg== +> User-Agent: curl/7.64.1 +> Accept: */* +> +< HTTP/1.1 200 OK +< Date: Fri, 29 May 2020 11:51:27 GMT +< Content-Type: text/html +< Content-Length: 1118 +< Last-Modified: Thu, 28 May 2020 00:39:41 GMT +< ETag: "5ecf084d-3fd" +< Cache-Control: max-age=0 +< + + +...... +``` + +## Additional Steps for AWS EKS Kubernetes Clusters + +You will need to create an ELB (Elastic Load Balancer) to expose the nginx Ingress controller to the Internet. Additional costs may apply. + +1. Create pre-requisite resources according to the [nginx ingress controller documentation.](https://kubernetes.github.io/ingress-nginx/deploy/#prerequisite-generic-deployment-command) + +2. Create an ELB by following [these steps.](https://kubernetes.github.io/ingress-nginx/deploy/#aws) + +## References +https://kubernetes.github.io/ingress-nginx/ diff --git a/content/docs/1.5.1/deploy/important-notes/index.md b/content/docs/1.5.1/deploy/important-notes/index.md new file mode 100644 index 000000000..d050ce481 --- /dev/null +++ b/content/docs/1.5.1/deploy/important-notes/index.md @@ -0,0 +1,108 @@ +--- +title: Important Notes +weight: 4 +--- + +This page lists important notes for Longhorn v{{< current-version >}}. +Please see [here](https://github.com/longhorn/longhorn/releases/tag/v{{< current-version >}}) for the full release note. + +## Notes + +### Supported Kubernetes Versions + +Please ensure your Kubernetes cluster is at least v1.21 before upgrading to Longhorn v{{< current-version >}} because this is the minimum version Longhorn v{{< current-version >}} supports. + +### Attachment/Detachment Refactoring Side Effect On The Upgrade Process + +In Longhorn v1.5.0, we refactored the internal volume attach/detach mechanism. +As a side effect, when you are upgrading from v1.4.x to v1.5.x, if there are in-progress operations such as volume cloning, backing image export from volume, and volume offline expansion, these operations will fail. +You will have to retry them manually. +To avoid this issue, please don't perform these operations during the upgrade. +Ref: https://github.com/longhorn/longhorn/issues/3715#issuecomment-1562305097 + +### Recurring Jobs + +The behavior of the recurring job types `Snapshot` and `Backup` will attempt to delete old snapshots first if they exceed the retained count before creating a new snapshot. Additionally, two new recurring job types have been introduced, `Snapshot Force Create` and `Backup Force Create`. They retain the original behavior of taking a snapshot or backup first before deleting outdated snapshots. + +### Longhorn Uninstallation + +To prevent Longhorn from being accidentally uninstalled (which leads to data lost), +we introduce a new setting, [deleting-confirmation-flag](../../references/settings/#deleting-confirmation-flag). +If this flag is **false**, the Longhorn uninstallation job will fail. +Set this flag to **true** to allow Longhorn uninstallation. +See more in the [uninstall](../uninstall) section. + +### Pod Security Policies Disabled & Pod Security Admission Introduction + +- Longhorn pods require privileged access to manage nodes' storage. In Longhorn `v1.3.x` or older, Longhorn was shipping some Pod Security Policies by default, (e.g., [link](https://github.com/longhorn/longhorn/blob/4ba39a989b4b482d51fd4bc651f61f2b419428bd/chart/values.yaml#L260)). +However, Pod Security Policy has been deprecated since Kubernetes v1.21 and removed since Kubernetes v1.25, [link](https://kubernetes.io/docs/concepts/security/pod-security-policy/). +Therefore, we stopped shipping the Pod Security Policies by default. +For Kubernetes < v1.25, if your cluster still enables Pod Security Policy admission controller, please do: + - Helm installation method: set the helm value `enablePSP` to `true` to install `longhorn-psp` PodSecurityPolicy resource which allows privileged Longhorn pods to start. + - Kubectl installation method: need to apply the [podsecuritypolicy.yaml](https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/podsecuritypolicy.yaml) manifest in addition to applying the `longhorn.yaml` manifests. + - Rancher UI installation method: set `Other Settings > Pod Security Policy` to `true` to install `longhorn-psp` PodSecurityPolicy resource which allows privileged Longhorn pods to start. + +- As a replacement for Pod Security Policy, Kubernetes provides a new mechanism, [Pod Security Admission](https://kubernetes.io/docs/concepts/security/pod-security-admission/). +If you enable the Pod Security Admission controller and change the default behavior to block privileged pods, +you must add the correct labels to the namespace where Longhorn pods run to allow Longhorn pods to start successfully +(because Longhorn pods require privileged access to manage storage). +For example, adding the following labels to the namespace that is running Longhorn pods: + ```yaml + apiVersion: v1 + kind: Namespace + metadata: + name: longhorn-system + labels: + pod-security.kubernetes.io/enforce: privileged + pod-security.kubernetes.io/enforce-version: latest + pod-security.kubernetes.io/audit: privileged + pod-security.kubernetes.io/audit-version: latest + pod-security.kubernetes.io/warn: privileged + pod-security.kubernetes.io/warn-version: latest + ``` + +### Updating CSI Snapshot CRD `v1beta1` to `v1`, `v1beta1` Removed + +Support for the `v1beta1` version of CSI snapshot CRDs was previously deprecated in favor of the `v1` version. +The CSI components in Longhorn v{{< current-version >}} only function with the `v1` version. +Please follow the instructions at [Enable CSI Snapshot Support](../../snapshots-and-backups/csi-snapshot-support/enable-csi-snapshot-support) to update CSI snapshot CRDs and the CSI snapshot controller. +If you have Longhorn volume manifests or scripts that are still using `v1beta1` version, you must upgrade them to `v1` as well. + +### `Custom mkfs.ext4 Parameters` Setting Removed + +The `Custom mkfs.ext4 Parameters` setting was deprecated in Longhorn `v1.4.0` and is now removed. The per-StorageClass `mkfsParams` parameter should be used to specify mkfs options (e.g., `-I 256 -b 4096 -O ^metadata_csum,^64bit`) instead. See [Creating Longhorn Volumes with kubectl](../../volumes-and-nodes/create-volumes/#creating-longhorn-volumes-with-kubectl) for details. + +### `Disable Replica Rebuild` Setting Removed + +The `Disable Replica Rebuild` setting was deprecated and replaced by the [Concurrent Replica Rebuild Per Node Limit](../../references/settings/#concurrent-replica-rebuild-per-node-limit) setting in Longhorn `v1.2.1`. It should already have been ignored in any Longhorn deployment upgrading to Longhorn v{{< current-version >}} and is now removed. To disable replica rebuilding across the cluster, set the `Concurrent Replica Rebuild Per Node Limit` to 0. + +### `Default Manager Image` Settings Removed + +The `Default Backing Image Manager Image`, `Default Instance Manager Image` and `Default Share Manager Image` settings were deprecated and removed from `v1.5.0`. These default manager image settings can be changed on the manager starting command line only. They should be modified in the Longhorn deploying manifest or `values.yaml` in Longhorn chart. + +### `Allow Node Drain with the Last Healthy Replica` Settings Removed +The `Allow Node Drain with the Last Healthy Replica` setting was deprecated in Longhorn v1.4.2 and is now removed. +Please use the new setting [Node Drain Policy](../../references/settings#node-drain-policy) instead. + +### Instance Managers Consolidated + +Engine instance mangers and replica instance managers has been consolidated. Previous engine/replica instance managers are now deprecated, but they will still provide service to the existing attached volumes. + +The `Guaranteed Engine Manager CPU` and `Guaranteed Replica Manager CPU` settings are removed and replaced by `Guaranteed Instance Manager CPU`. + +The `engineManagerCPURequest` and `replicaManagerCPURequest` fields in Longhorn Node custom resource spec are removed and replaced by `instanceManagerCPURequest`. + +### Custom Resource Fields Removed + +Starting from `v1.5.0`, the following deprecated custom resource fields will be removed: +- Volume.spec.recurringJob +- Volume.spec.baseImage +- Replica.spec.baseImage +- Replica.spec.dataPath +- InstanceManager.spec.engineImage +- BackingImage.spec.imageURL +- BackingImage.status.diskDownloadProgressMap +- BackingImage.status.diskDownloadStateMap +- BackingImageManager.status.backingImageFileMap.directory +- BackingImageManager.status.backingImageFileMap.downloadProgress +- BackingImageManager.status.backingImageFileMap.url diff --git a/content/docs/1.5.1/deploy/install/_index.md b/content/docs/1.5.1/deploy/install/_index.md new file mode 100644 index 000000000..7117ae056 --- /dev/null +++ b/content/docs/1.5.1/deploy/install/_index.md @@ -0,0 +1,254 @@ +--- +title: Quick Installation +description: Install Longhorn on Kubernetes +weight: 1 +--- + +> **Note**: This quick installation guide uses some configurations which are not for production usage. +> Please see [Best Practices](../../best-practices/) for how to configure Longhorn for production usage. + +Longhorn can be installed on a Kubernetes cluster in several ways: + +- [Rancher catalog app](./install-with-rancher) +- [kubectl](./install-with-kubectl/) +- [Helm](./install-with-helm/) + +To install Longhorn in an air gapped environment, refer to [this section.](../../advanced-resources/deploy/airgap) + +For information on customizing Longhorn's default settings, refer to [this section.](../../advanced-resources/deploy/customizing-default-settings) + +For information on deploying Longhorn on specific nodes and rejecting general workloads for those nodes, refer to the section on [taints and tolerations.](../../advanced-resources/deploy/taint-toleration) + +# Installation Requirements + +Each node in the Kubernetes cluster where Longhorn is installed must fulfill the following requirements: + +- A container runtime compatible with Kubernetes (Docker v1.13+, containerd v1.3.7+, etc.) +- Kubernetes >= v1.21 +- `open-iscsi` is installed, and the `iscsid` daemon is running on all the nodes. This is necessary, since Longhorn relies on `iscsiadm` on the host to provide persistent volumes to Kubernetes. For help installing `open-iscsi`, refer to [this section.](#installing-open-iscsi) +- RWX support requires that each node has a NFSv4 client installed. + - For installing a NFSv4 client, refer to [this section.](#installing-nfsv4-client) +- The host filesystem supports the `file extents` feature to store the data. Currently we support: + - ext4 + - XFS +- `bash`, `curl`, `findmnt`, `grep`, `awk`, `blkid`, `lsblk` must be installed. +- [Mount propagation](https://kubernetes-csi.github.io/docs/deploying.html#enabling-mount-propagation) must be enabled. + +The Longhorn workloads must be able to run as root in order for Longhorn to be deployed and operated properly. + +[This script](#using-the-environment-check-script) can be used to check the Longhorn environment for potential issues. + +For the minimum recommended hardware, refer to the [best practices guide.](../../best-practices/#minimum-recommended-hardware) + +### OS/Distro Specific Configuration + +- **Google Kubernetes Engine (GKE)** requires some additional setup for Longhorn to function properly. If you're a GKE user, refer to [this section](../../advanced-resources/os-distro-specific/csi-on-gke) for details. +- **K3s clusters** require some extra setup. Refer to [this section](../../advanced-resources/os-distro-specific/csi-on-k3s) +- **RKE clusters with CoreOS** need [this configuration.](../../advanced-resources/os-distro-specific/csi-on-rke-and-coreos) + +### Using the Environment Check Script + +We've written a script to help you gather enough information about the factors. + +Note `jq` maybe required to be installed locally prior to running env check script. + +To run script: + +```shell +curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/scripts/environment_check.sh | bash +``` + +Example result: + +```shell +[INFO] Required dependencies are installed. +[INFO] Waiting for longhorn-environment-check pods to become ready (0/3)... +[INFO] All longhorn-environment-check pods are ready (3/3). +[INFO] Required packages are installed. +[INFO] Cleaning up longhorn-environment-check pods... +[INFO] Cleanup completed. +``` + +### Pod Security Policy + +Starting with v1.0.2, Longhorn is shipped with a default Pod Security Policy that will give Longhorn the necessary privileges to be able to run properly. + +No special configuration is needed for Longhorn to work properly on clusters with Pod Security Policy enabled. + +### Notes on Mount Propagation + +If your Kubernetes cluster was provisioned by Rancher v2.0.7+ or later, the MountPropagation feature is enabled by default. + +If MountPropagation is disabled, Base Image feature will be disabled. + +### Root and Privileged Permission + +Longhorn components require root access with privileged permissions to achieve volume operations and management, because Longhorn relies on system resources on the host across different namespaces, for example, Longhorn uses `nsenter` to understand block devices' usage or encrypt/decrypt volumes on the host. + +Below are the directories Longhorn components requiring access with root and privileged permissions : + +- Longhorn Manager + - /dev: Block devices created by Longhorn are under the `/dev` path. + - /proc: Find the recognized host process like container runtime, then use `nsenter` to access the mounts on the host to understand disks usage. + - /var/lib/longhorn: The default path for storing volume data on a host. +- Longhorn Engine Image + - /var/lib/longhorn/engine-binaries: The default path for storing the Longhorn engine binaries. +- Longhorn Instance Manager + - /: Access any data path on this node and access Longhorn engine binaries. + - /dev: Block devices created by Longhorn are under the `/dev` path. + - /proc: Find the recognized host process like container runtime, then use `nsenter` to manage iSCSI targets and initiators, also some file system +- Longhorn Share Manager + - /dev: Block devices created by Longhorn are under the `/dev` path. + - /lib/modules: Kernel modules required by `cryptsetup` for volume encryption. + - /proc: Find the recognized host process like container runtime, then use `nsenter` for volume encryption. + - /sys: Support volume encryption by `cryptsetup`. +- Longhorn CSI Plugin + - /: For host checks via the NFS customer mounter (deprecated). Note that, this will be removed in the future release. + - /dev: Block devices created by Longhorn are under the `/dev` path. + - /lib/modules: Kernel modules required by Longhorn CSI plugin. + - /sys: Support volume encryption by `cryptsetup`. + - /var/lib/kubelet/plugins/kubernetes.io/csi: The path where the Longhorn CSI plugin creates the staging path (via `NodeStageVolume`) of a block device. The staging path will be bind-mounted to the target path `/var/lib/kubelet/pods` (via `NodePublishVolume`) for support single volume could be mounted to multiple Pods. + - /var/lib/kubelet/plugins_registry: The path where the node-driver-registrar registers the CSI plugin with kubelet. + - /var/lib/kubelet/plugins/driver.longhorn.io: The path where the socket for the communication between kubelet Longhorn CSI driver. + - /var/lib/kubelet/pods: The path where the Longhorn CSI driver mounts volume from the target path (via `NodePublishVolume`). +- Longhorn CSI Attacher/Provisioner/Resizer/Snapshotter + - /var/lib/kubelet/plugins/driver.longhorn.io: The path where the socket for the communication between kubelet Longhorn CSI driver. +- Longhorn Backing Image Manager + - /var/lib/longhorn: The default path for storing data on the host. +- Longhorn Backing Image Data Source + - /var/lib/longhorn: The default path for storing data on the host. +- Longhorn System Restore Rollout + - /var/lib/longhorn/engine-binaries: The default path for storing the Longhorn engine binaries. + +### Installing open-iscsi + +The command used to install `open-iscsi` differs depending on the Linux distribution. + +For GKE, we recommend using Ubuntu as the guest OS image since it contains`open-iscsi` already. + +You may need to edit the cluster security group to allow SSH access. + +For SUSE and openSUSE, use this command: + +``` +zypper install open-iscsi +``` + +For Debian and Ubuntu, use this command: + +``` +apt-get install open-iscsi +``` + +For RHEL, CentOS, and EKS with EKS Kubernetes Worker AMI with AmazonLinux2 image, use below commands: + +``` +yum --setopt=tsflags=noscripts install iscsi-initiator-utils +echo "InitiatorName=$(/sbin/iscsi-iname)" > /etc/iscsi/initiatorname.iscsi +systemctl enable iscsid +systemctl start iscsid +``` + +Please ensure iscsi_tcp module has been loaded before iscsid service starts. Generally, it should be automatically loaded along with the package installation. + +``` +modprobe iscsi_tcp +``` + +We also provide an `iscsi` installer to make it easier for users to install `open-iscsi` automatically: +``` +kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/deploy/prerequisite/longhorn-iscsi-installation.yaml +``` +After the deployment, run the following command to check pods' status of the installer: +``` +kubectl get pod | grep longhorn-iscsi-installation +longhorn-iscsi-installation-49hd7 1/1 Running 0 21m +longhorn-iscsi-installation-pzb7r 1/1 Running 0 39m +``` +And also can check the log with the following command to see the installation result: +``` +kubectl logs longhorn-iscsi-installation-pzb7r -c iscsi-installation +... +Installed: + iscsi-initiator-utils.x86_64 0:6.2.0.874-7.amzn2 + +Dependency Installed: + iscsi-initiator-utils-iscsiuio.x86_64 0:6.2.0.874-7.amzn2 + +Complete! +Created symlink from /etc/systemd/system/multi-user.target.wants/iscsid.service to /usr/lib/systemd/system/iscsid.service. +iscsi install successfully +``` + +In rare cases, it may be required to modify the installed SELinux policy to get Longhorn working. If you are running +an up-to-date version of a Fedora downstream distribution (e.g. Fedora, RHEL, Rocky, CentOS, etc.) and plan to leave +SELinux enabled, see [the KB](../../../../kb/troubleshooting-volume-attachment-fails-due-to-selinux-denials) for details. + +### Installing NFSv4 client + +In Longhorn system, backup feature requires NFSv4, v4.1 or v4.2, and ReadWriteMany (RWX) volume feature requires NFSv4.1. Before installing NFSv4 client userspace daemon and utilities, make sure the client kernel support is enabled on each Longhorn node. + +- Check `NFSv4.1` support is enabled in kernel + ``` + cat /boot/config-`uname -r`| grep CONFIG_NFS_V4_1 + ``` + +- Check `NFSv4.2` support is enabled in kernel + ``` + cat /boot/config-`uname -r`| grep CONFIG_NFS_V4_2 + ``` + + +The command used to install a NFSv4 client differs depending on the Linux distribution. + +- For Debian and Ubuntu, use this command: + ``` + apt-get install nfs-common + ``` + +- For RHEL, CentOS, and EKS with `EKS Kubernetes Worker AMI with AmazonLinux2 image`, use this command: + ``` + yum install nfs-utils + ``` + +- For SUSE/OpenSUSE you can install a NFSv4 client via: + ``` + zypper install nfs-client + ``` + +We also provide an `nfs` installer to make it easier for users to install `nfs-client` automatically: +``` +kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/deploy/prerequisite/longhorn-nfs-installation.yaml +``` +After the deployment, run the following command to check pods' status of the installer: +``` +kubectl get pod | grep longhorn-nfs-installation +NAME READY STATUS RESTARTS AGE +longhorn-nfs-installation-t2v9v 1/1 Running 0 143m +longhorn-nfs-installation-7nphm 1/1 Running 0 143m +``` +And also can check the log with the following command to see the installation result: +``` +kubectl logs longhorn-nfs-installation-t2v9v -c nfs-installation +... +nfs install successfully +``` + +### Checking the Kubernetes Version + +Use the following command to check your Kubernetes server version + +```shell +kubectl version +``` + +Result: + +```shell +Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:31:21Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"} +Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0+k3s1", GitCommit:"2705431d9645d128441c578309574cd262285ae6", GitTreeState:"clean", BuildDate:"2021-04-26T21:45:52Z", GoVersion:"go1.16.2", Compiler:"gc", Platform:"linux/amd64"} +``` + +The `Server Version` should be >= v1.21. + + diff --git a/content/docs/1.5.1/deploy/install/install-with-helm.md b/content/docs/1.5.1/deploy/install/install-with-helm.md new file mode 100644 index 000000000..d2f69f9ce --- /dev/null +++ b/content/docs/1.5.1/deploy/install/install-with-helm.md @@ -0,0 +1,83 @@ +--- +title: Install with Helm +weight: 9 +--- + +In this section, you will learn how to install Longhorn with Helm. + +### Prerequisites + +- Each node in the Kubernetes cluster where Longhorn will be installed must fulfill [these requirements.](../#installation-requirements) +- Helm v2.0+ must be installed on your workstation. + - 1. Refer to the official documentation for help installing Helm. + - 2. If you're using a Helm version prior to version 3.0, you need to [install Tiller in your Kubernetes cluster with role-based access control (RBAC)](https://v2.helm.sh/docs/using_helm/#tiller-namespaces-and-rbac). + +> [This script](https://github.com/longhorn/longhorn/blob/v{{< current-version >}}/scripts/environment_check.sh) can be used to check the Longhorn environment for potential issues. + +### Installing Longhorn + + +> **Note**: +> * The initial settings for Longhorn can be found in [customized using Helm options or by editing the deployment configuration file.](../../../advanced-resources/deploy/customizing-default-settings/#using-helm) +> * For Kubernetes < v1.25, if your cluster still enables Pod Security Policy admission controller, set the helm value `enablePSP` to `true` to install `longhorn-psp` PodSecurityPolicy resource which allows privileged Longhorn pods to start. + + +1. Add the Longhorn Helm repository: + + ```shell + helm repo add longhorn https://charts.longhorn.io + ``` + +2. Fetch the latest charts from the repository: + + ```shell + helm repo update + ``` + +3. Install Longhorn in the `longhorn-system` namespace. + + To install Longhorn with Helm 2, use the command: + + ```shell + helm install longhorn/longhorn --name longhorn --namespace longhorn-system --version {{< current-version >}} + ``` + + To install Longhorn with Helm 3, use the commands: + + ```shell + helm install longhorn longhorn/longhorn --namespace longhorn-system --create-namespace --version {{< current-version >}} + ``` + +4. To confirm that the deployment succeeded, run: + + ```bash + kubectl -n longhorn-system get pod + ``` + + The result should look like the following: + + ```bash + NAME READY STATUS RESTARTS AGE + longhorn-ui-b7c844b49-w25g5 1/1 Running 0 2m41s + longhorn-manager-pzgsp 1/1 Running 0 2m41s + longhorn-driver-deployer-6bd59c9f76-lqczw 1/1 Running 0 2m41s + longhorn-csi-plugin-mbwqz 2/2 Running 0 100s + csi-snapshotter-588457fcdf-22bqp 1/1 Running 0 100s + csi-snapshotter-588457fcdf-2wd6g 1/1 Running 0 100s + csi-provisioner-869bdc4b79-mzrwf 1/1 Running 0 101s + csi-provisioner-869bdc4b79-klgfm 1/1 Running 0 101s + csi-resizer-6d8cf5f99f-fd2ck 1/1 Running 0 101s + csi-provisioner-869bdc4b79-j46rx 1/1 Running 0 101s + csi-snapshotter-588457fcdf-bvjdt 1/1 Running 0 100s + csi-resizer-6d8cf5f99f-68cw7 1/1 Running 0 101s + csi-attacher-7bf4b7f996-df8v6 1/1 Running 0 101s + csi-attacher-7bf4b7f996-g9cwc 1/1 Running 0 101s + csi-attacher-7bf4b7f996-8l9sw 1/1 Running 0 101s + csi-resizer-6d8cf5f99f-smdjw 1/1 Running 0 101s + instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc 1/1 Running 0 114s + engine-image-ei-df38d2e5-cv6nc 1/1 Running 0 114s + ``` + +5. To enable access to the Longhorn UI, you will need to set up an Ingress controller. Authentication to the Longhorn UI is not enabled by default. For information on creating an NGINX Ingress controller with basic authentication, refer to [this section.](../../accessing-the-ui/longhorn-ingress) + +6. Access the Longhorn UI using [these steps.](../../accessing-the-ui) diff --git a/content/docs/1.5.1/deploy/install/install-with-kubectl.md b/content/docs/1.5.1/deploy/install/install-with-kubectl.md new file mode 100644 index 000000000..c18cf6d20 --- /dev/null +++ b/content/docs/1.5.1/deploy/install/install-with-kubectl.md @@ -0,0 +1,145 @@ +--- +title: Install with Kubectl +description: Install Longhorn with the kubectl client. +weight: 8 +--- + +## Prerequisites + +Each node in the Kubernetes cluster where Longhorn will be installed must fulfill [these requirements.](../#installation-requirements) + +[This script](https://github.com/longhorn/longhorn/blob/v{{< current-version >}}/scripts/environment_check.sh) can be used to check the Longhorn environment for potential issues. + +The initial settings for Longhorn can be customized by [editing the deployment configuration file.](../../../advanced-resources/deploy/customizing-default-settings/#using-the-longhorn-deployment-yaml-file) + +## Installing Longhorn + +1. Install Longhorn on any Kubernetes cluster using this command: + + ```shell + kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/deploy/longhorn.yaml + ``` + + One way to monitor the progress of the installation is to watch pods being created in the `longhorn-system` namespace: + + ```shell + kubectl get pods \ + --namespace longhorn-system \ + --watch + ``` + +2. Check that the deployment was successful: + + ```shell + $ kubectl -n longhorn-system get pod + NAME READY STATUS RESTARTS AGE + longhorn-ui-b7c844b49-w25g5 1/1 Running 0 2m41s + longhorn-manager-pzgsp 1/1 Running 0 2m41s + longhorn-driver-deployer-6bd59c9f76-lqczw 1/1 Running 0 2m41s + longhorn-csi-plugin-mbwqz 2/2 Running 0 100s + csi-snapshotter-588457fcdf-22bqp 1/1 Running 0 100s + csi-snapshotter-588457fcdf-2wd6g 1/1 Running 0 100s + csi-provisioner-869bdc4b79-mzrwf 1/1 Running 0 101s + csi-provisioner-869bdc4b79-klgfm 1/1 Running 0 101s + csi-resizer-6d8cf5f99f-fd2ck 1/1 Running 0 101s + csi-provisioner-869bdc4b79-j46rx 1/1 Running 0 101s + csi-snapshotter-588457fcdf-bvjdt 1/1 Running 0 100s + csi-resizer-6d8cf5f99f-68cw7 1/1 Running 0 101s + csi-attacher-7bf4b7f996-df8v6 1/1 Running 0 101s + csi-attacher-7bf4b7f996-g9cwc 1/1 Running 0 101s + csi-attacher-7bf4b7f996-8l9sw 1/1 Running 0 101s + csi-resizer-6d8cf5f99f-smdjw 1/1 Running 0 101s + instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc 1/1 Running 0 114s + engine-image-ei-df38d2e5-cv6nc 1/1 Running 0 114s + ``` +3. To enable access to the Longhorn UI, you will need to set up an Ingress controller. Authentication to the Longhorn UI is not enabled by default. For information on creating an NGINX Ingress controller with basic authentication, refer to [this section.](../../accessing-the-ui/longhorn-ingress) +4. Access the Longhorn UI using [these steps.](../../accessing-the-ui) + +> **Note**: +> For Kubernetes < v1.25, if your cluster still enables Pod Security Policy admission controller, need to apply the [podsecuritypolicy.yaml](https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/podsecuritypolicy.yaml) manifest in addition to applying the `longhorn.yaml` manifests. + + + +### List of Deployed Resources + + +The following items will be deployed to Kubernetes: + +#### Namespace: longhorn-system + +All Longhorn bits will be scoped to this namespace. + +#### ServiceAccount: longhorn-service-account + +Service account is created in the longhorn-system namespace. + +#### ClusterRole: longhorn-role + +This role will have access to: + - In apiextension.k8s.io (All verbs) + - customresourcedefinitions + - In core (All verbs) + - pods + - /logs + - events + - persistentVolumes + - persistentVolumeClaims + - /status + - nodes + - proxy/nodes + - secrets + - services + - endpoints + - configMaps + - In core + - namespaces (get, list) + - In apps (All Verbs) + - daemonsets + - statefulSets + - deployments + - In batch (All Verbs) + - jobs + - cronjobs + - In storage.k8s.io (All verbs) + - storageclasses + - volumeattachments + - csinodes + - csidrivers + - In coordination.k8s.io + - leases + +#### ClusterRoleBinding: longhorn-bind + +This connects the longhorn-role to the longhorn-service-account in the longhorn-system namespace + +#### CustomResourceDefinitions + +The following CustomResourceDefinitions will be installed + +- In longhorn.io + - backingimagedatasources + - backingimagemanagers + - backingimages + - backups + - backuptargets + - backupvolumes + - engineimages + - engines + - instancemanagers + - nodes + - recurringjobs + - replicas + - settings + - sharemanagers + - volumes + +#### Kubernetes API Objects + +- A config map with the default settings +- The longhorn-manager DaemonSet +- The longhorn-backend service exposing the longhorn-manager DaemonSet internally to Kubernetes +- The longhorn-ui Deployment +- The longhorn-frontend service exposing the longhorn-ui internally to Kubernetes +- The longhorn-driver-deployer that deploys the CSI driver +- The longhorn StorageClass + diff --git a/content/docs/1.5.1/deploy/install/install-with-rancher.md b/content/docs/1.5.1/deploy/install/install-with-rancher.md new file mode 100644 index 000000000..af8c7b877 --- /dev/null +++ b/content/docs/1.5.1/deploy/install/install-with-rancher.md @@ -0,0 +1,39 @@ +--- +title: Install as a Rancher Apps & Marketplace +description: Run Longhorn on Kubernetes with Rancher 2.x +weight: 7 +--- + +One benefit of installing Longhorn through Rancher Apps & Marketplace is that Rancher provides authentication to the Longhorn UI. + +If there is a new version of Longhorn available, you will see an `Upgrade Available` sign on the `Apps & Marketplace` screen. You can click `Upgrade` button to upgrade Longhorn manager. See more about upgrade [here](../../upgrade). + +## Prerequisites + +Each node in the Kubernetes cluster where Longhorn is installed must fulfill [these requirements.](../#installation-requirements) + +[This script](https://github.com/longhorn/longhorn/blob/v{{< current-version >}}/scripts/environment_check.sh) can be used to check the Longhorn environment for potential issues. + +## Installation + +> **Note**: +> * For Kubernetes < v1.25, if your cluster still enables Pod Security Policy admission controller, set `Other Settings > Pod Security Policy` to `true` to install `longhorn-psp` PodSecurityPolicy resource which allows privileged Longhorn pods to start. + +1. Optional: If Rancher version is 2.5.9 or before, we recommend creating a new project for Longhorn, for example, `Storage`. +2. Navigate to the cluster where you will install Longhorn. + {{< figure src="/img/screenshots/install/rancher-2.6/select-project.png" >}} +3. Navigate to the `Apps & Marketplace` screen. + {{< figure src="/img/screenshots/install/rancher-2.6/apps-launch.png" >}} +4. Find the Longhorn item in the charts and click it. + {{< figure src="/img/screenshots/install/rancher-2.6/longhorn.png" >}} +5. Click **Install**. + {{< figure src="/img/screenshots/install/rancher-2.6/longhorn-chart.png" >}} +6. Optional: Select the project where you want to install Longhorn. +7. Optional: Customize the default settings. + {{< figure src="/img/screenshots/install/rancher-2.6/launch-longhorn.png" >}} +8. Click Next. Longhorn will be installed in the longhorn-system namespace. + {{< figure src="/img/screenshots/install/rancher-2.6/installed-longhorn.png" >}} +9. Click the Longhorn App Icon to navigate to the Longhorn dashboard. + {{< figure src="/img/screenshots/install/rancher-2.6/dashboard.png" >}} + +After Longhorn has been successfully installed, you can access the Longhorn UI by navigating to the `Longhorn` option from Rancher left panel. diff --git a/content/docs/1.5.1/deploy/uninstall/_index.md b/content/docs/1.5.1/deploy/uninstall/_index.md new file mode 100644 index 000000000..c0431b58e --- /dev/null +++ b/content/docs/1.5.1/deploy/uninstall/_index.md @@ -0,0 +1,140 @@ +--- +title: Uninstall Longhorn +weight: 6 +--- + +In this section, you'll learn how to uninstall Longhorn. + + +- [Prerequisite](#prerequisite) +- [Uninstalling Longhorn from the Rancher UI](#uninstalling-longhorn-from-the-rancher-ui) +- [Uninstalling Longhorn using Helm](#uninstalling-longhorn-using-helm) +- [Uninstalling Longhorn using kubectl](#uninstalling-longhorn-using-kubectl) +- [Troubleshooting](#troubleshooting) + +### Prerequisite +To prevent Longhorn from being accidentally uninstalled (which leads to data lost), +we introduce a new setting, [deleting-confirmation-flag](../../references/settings/#deleting-confirmation-flag). +If this flag is **false**, the Longhorn uninstallation job will fail. +Set this flag to **true** to allow Longhorn uninstallation. +You can set this flag using setting page in Longhorn UI or `kubectl -n longhorn-system patch -p '{"value": "true"}' --type=merge lhs deleting-confirmation-flag` + + +To prevent damage to the Kubernetes cluster, we recommend deleting all Kubernetes workloads using Longhorn volumes (PersistentVolume, PersistentVolumeClaim, StorageClass, Deployment, StatefulSet, DaemonSet, etc). + +### Uninstalling Longhorn from the Rancher UI + +From Rancher UI, navigate to `Catalog Apps` tab and delete Longhorn app. + +### Uninstalling Longhorn using Helm + +Run this command: + +``` +helm uninstall longhorn -n longhorn-system +``` + +### Uninstalling Longhorn using kubectl + +1. Create the uninstallation job to clean up CRDs from the system and wait for success: + + ``` + kubectl create -f https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/uninstall/uninstall.yaml + kubectl get job/longhorn-uninstall -n longhorn-system -w + ``` + + Example output: + ``` + $ kubectl create -f https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/uninstall/uninstall.yaml + serviceaccount/longhorn-uninstall-service-account created + clusterrole.rbac.authorization.k8s.io/longhorn-uninstall-role created + clusterrolebinding.rbac.authorization.k8s.io/longhorn-uninstall-bind created + job.batch/longhorn-uninstall created + + $ kubectl get job/longhorn-uninstall -n longhorn-system -w + NAME COMPLETIONS DURATION AGE + longhorn-uninstall 0/1 3s 3s + longhorn-uninstall 1/1 20s 20s + ``` + +2. Remove remaining components: + ``` + kubectl delete -f https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/deploy/longhorn.yaml + kubectl delete -f https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/uninstall/uninstall.yaml + ``` + +> **Tip:** If you try `kubectl delete -f https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/deploy/longhorn.yaml` first and get stuck there, +pressing `Ctrl C` then running `kubectl create -f https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/uninstall/uninstall.yaml` can also help you remove Longhorn. Finally, don't forget to cleanup remaining components. + + + + +### Troubleshooting +#### Uninstalling using Rancher UI or Helm failed, I am not sure why +You might want to check the logs of the `longhorn-uninstall-xxx` pod inside `longhorn-system` namespace to see why it failed. +One reason can be that [deleting-confirmation-flag](../../references/settings/#deleting-confirmation-flag) is `false`. +You can set it to `true` by using setting page in Longhorn UI or `kubectl -n longhorn-system patch -p '{"value": "true"}' --type=merge lhs deleting-confirmation-flag` +then retry the Helm/Rancher uninstallation. + +If the uninstallation was an accident (you don't actually want to uninstall Longhorn), +you can cancel the uninstallation as the following. +1. If you use Rancher UI to deploy Longhorn + 1. Open a kubectl shell on Rancher UI + 1. Find the latest revision of Longhorn release + ```shell + > helm list -n longhorn-system -a + NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION + longhorn longhorn-system 2 2022-10-14 01:22:36.929130451 +0000 UTC uninstalling longhorn-100.2.3+up1.3.2-rc1 v1.3.2-rc1 + longhorn-crd longhorn-system 3 2022-10-13 22:19:05.976625081 +0000 UTC deployed longhorn-crd-100.2.3+up1.3.2-rc1 v1.3.2-rc1 + ``` + 1. Rollback to the latest revision + ```shell + > helm rollback longhorn 2 -n longhorn-system + checking 22 resources for changes + ... + Rollback was a success! Happy Helming! + ``` +1. If you use Helm deploy Longhorn + 1. Open a kubectl terminal + 1. Find the latest revision of Longhorn release + ```shell + ➜ helm list --namespace longhorn-system -a + NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION + longhorn longhorn-system 1 2022-10-14 13:45:25.341292504 -0700 PDT uninstalling longhorn-1.4.0-dev v1.4.0-dev + ``` + 1. Rollback to the latest revision + ```shell + ➜ helm rollback longhorn 1 -n longhorn-system + Rollback was a success! Happy Helming! + ``` + + +#### I deleted the Longhorn App from Rancher UI instead of following the uninstallation procedure + +Redeploy the (same version) Longhorn App. Follow the uninstallation procedure above. + +#### Problems with CRDs + +If your CRD instances or the CRDs themselves can't be deleted for whatever reason, run the commands below to clean up. Caution: this will wipe all Longhorn state! + +```shell +# Delete CRD finalizers, instances and definitions +for crd in $(kubectl get crd -o jsonpath={.items[*].metadata.name} | tr ' ' '\n' | grep longhorn.rancher.io); do + kubectl -n ${NAMESPACE} get $crd -o yaml | sed "s/\- longhorn.rancher.io//g" | kubectl apply -f - + kubectl -n ${NAMESPACE} delete $crd --all + kubectl delete crd/$crd +done +``` + +#### Volume can be attached/detached from UI, but Kubernetes Pod/StatefulSet etc cannot use it + +Check if volume plugin directory has been set correctly. This is automatically detected unless user explicitly set it. Note: The FlexVolume plugin is deprecated as of Longhorn v0.8.0 and should no longer be used. + +By default, Kubernetes uses `/usr/libexec/kubernetes/kubelet-plugins/volume/exec/`, as stated in the [official document](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-storage/flexvolume.md/#prerequisites). + +Some vendors choose to change the directory for various reasons. For example, GKE uses `/home/kubernetes/flexvolume` instead. + +User can find the correct directory by running `ps aux|grep kubelet` on the host and check the `--volume-plugin-dir` parameter. If there is none, the default `/usr/libexec/kubernetes/kubelet-plugins/volume/exec/` will be used. + +--- +Please see [link](https://github.com/longhorn/longhorn) for more information. diff --git a/content/docs/1.5.1/deploy/upgrade/_index.md b/content/docs/1.5.1/deploy/upgrade/_index.md new file mode 100644 index 000000000..949f0b567 --- /dev/null +++ b/content/docs/1.5.1/deploy/upgrade/_index.md @@ -0,0 +1,67 @@ +--- +title: Upgrade +weight: 3 +--- + +Here we cover how to upgrade to the latest Longhorn from all previous releases. + +# Deprecation & Incompatibility + +There are no deprecated or incompatible changes introduced in v{{< current-version >}}. + +# Upgrade Path Enforcement + +Since Longhorn v1.5.0, Longhorn only allows upgrades from supported versions. If upgrading from any unsupported version, the upgrade will fail. However, users can revert to the previous state without any service interruption or downtime. + +In addition, Longhorn disallows downgrading to any previous version to prevent unexpected system statuses caused by potential function incompatibility, deprecation, or removal. Please refer to the following matrix to understand the supported upgrade versions: + + | Current version | Upgrading version | Allow | Example | + | :-: | :-: | :-: | :-: | + | x.y.* | x.(y+1).* | ✓ | v1.4.2 to v1.5.0 | + | x.y.* | x.y.(*+n) | ✓ | v1.5.0 to v1.5.1 | + | x.y[^lastMinorVersion].* | (x+1).y.* | ✓ | v1.30.0 to v2.0.0 | + | x.(y-1).* | x.(y+1).* | X | v1.3.3 to v1.5.0 | + | x.(y-2).* | x.(y+1).* | X | v1.2.6 to v1.5.0 | + | x.y.* | x.(y-1).* | X | v1.6.0 to v1.5.0 | + | x.y.* | x.y.(*-1) | X | v1.5.1 to v1.5.0 | + +[^lastMinorVersion]: Longhorn only allows upgrades from any patch version of the last minor release before the new major version. (For example, v1.30.* is allowed to upgrade to v2.0.*, given that v1.30 is the last minor release branch before 2.0.) + +> **Warning**: +> * Upgrade path enforcement is introduced in Longhorn v1.5.0, which means that downgrading from v1.5.0 to any previous version is possible. **Please note that downgrading is not supported**. + +# Upgrading Longhorn + +There are normally two steps in the upgrade process: first upgrade Longhorn manager to the latest version, then manually upgrade the Longhorn engine to the latest version using the latest Longhorn manager. + +## 1. Upgrade Longhorn manager + +- To upgrade from v1.4.x, see [this section.](./longhorn-manager) + +## 2. Manually Upgrade Longhorn Engine + +After Longhorn Manager is upgraded, Longhorn Engine also needs to be upgraded [using the Longhorn UI.](./upgrade-engine) + +## 3. Automatically Upgrade Longhorn Engine + +Since Longhorn v1.1.1, we provide an option to help you [automatically upgrade engines](./auto-upgrade-engine) + +## 4. Automatically Migrate Recurring Jobs + +With the introduction of the new label-driven `Recurring Job` feature, Longhorn has removed the `RecurringJobs` field in the Volume Spec and planned to deprecate `RecurringJobs` in the StorageClass. + +During the upgrade, Longhorn will automatically: +- Create new recurring job CRs from the `recurringJobs` field in Volume Spec and convert them to the volume labels. +- Create new recurring job CRs from the `recurringJobs` in the StorageClass and convert them to the new `recurringJobSelector` parameter. + +Visit [Recurring Snapshots and Backups](../../snapshots-and-backups/scheduling-backups-and-snapshots) for more information about the new `Recurring Job` feature. + +# Extended Reading + +Visit [Some old instance manager pods are still running after upgrade](https://longhorn.io/kb/troubleshooting-some-old-instance-manager-pods-are-still-running-after-upgrade) for more information about the cleanup strategy of instance manager pods during upgrade. + +# Need Help? + +If you have any issues, please report it at +https://github.com/longhorn/longhorn/issues and include your backup yaml files +as well as manager logs. diff --git a/content/docs/1.5.1/deploy/upgrade/auto-upgrade-engine.md b/content/docs/1.5.1/deploy/upgrade/auto-upgrade-engine.md new file mode 100644 index 000000000..4bb464ee8 --- /dev/null +++ b/content/docs/1.5.1/deploy/upgrade/auto-upgrade-engine.md @@ -0,0 +1,45 @@ +--- +title: Automatically Upgrading Longhorn Engine +weight: 3 +--- + +Since Longhorn v1.1.1, we provide an option to help you automatically upgrade Longhorn volumes to the new default engine version after upgrading Longhorn manager. +This feature reduces the amount of manual work you have to do when upgrading Longhorn. +There are a few concepts related to this feature as listed below: + +#### 1. Concurrent Automatic Engine Upgrade Per Node Limit Setting + +This is a setting that controls how Longhorn automatically upgrades volumes' engines to the new default engine image after upgrading Longhorn manager. +The value of this setting specifies the maximum number of engines per node that are allowed to upgrade to the default engine image at the same time. +If the value is 0, Longhorn will not automatically upgrade volumes' engines to the default version. +The bigger this value is, the faster the engine upgrade process finishes. + +However, giving a bigger value for this setting will consume more CPU and memory of the node during the engine upgrade process. +We recommend setting the value to 3 to leave some room for error but don't overwhelm the system with too many failed upgrades. + +#### 2. The behavior of Longhorn with different volume conditions. +In the following cases, assume that the `concurrent automatic engine upgrade per node limit` setting is bigger than 0. + +1. Attached Volumes + + If the volume is in attached state and healthy, Longhorn will automatically do a live upgrade for the volume's engine to the new default engine image. + +1. Detached Volumes + + Longhorn automatically does an offline upgrade for detached volume. + +1. Disaster Recovery Volumes + + Longhorn doesn't automatically upgrade [disaster recovery volumes](../../../snapshots-and-backups/setup-disaster-recovery-volumes/) to the new default engine image because it would trigger a full restoration for the disaster recovery volumes. +The full restoration might affect the performance of other running Longhorn volumes in the system. +So, Longhorn leaves it to you to decide when it is the good time to manually upgrade the engine for disaster recovery volumes (e.g., when the system is idle or during the maintenance time). + + However, when you activate the disaster recovery volume, it will be activated and then detached. +At this time, Longhorn will automatically do offline upgrade for the volume similar to the detached volume case. + +#### 3. What Happened If The Upgrade Fails? +If a volume failed to upgrade its engine, the engine image in volume's spec will remain to be different than the engine image in the volume's status. +Longhorn will continuously retry to upgrade until it succeeds. + +If there are too many volumes that fail to upgrade per node (i.e., more than the `concurrent automatic engine upgrade per node limit` setting), +Longhorn will stop upgrading volume on that node. diff --git a/content/docs/1.5.1/deploy/upgrade/longhorn-manager.md b/content/docs/1.5.1/deploy/upgrade/longhorn-manager.md new file mode 100644 index 000000000..fdff37a73 --- /dev/null +++ b/content/docs/1.5.1/deploy/upgrade/longhorn-manager.md @@ -0,0 +1,142 @@ +--- +title: Upgrading Longhorn Manager +weight: 1 +--- + +### Upgrading from v1.4.x + +We only support upgrading to v{{< current-version >}} from v1.4.x. For other versions, please upgrade to v1.4.x first. + +Engine live upgrade is supported from v1.4.x to v{{< current-version >}}. + +For airgap upgrades when Longhorn is installed as a Rancher app, you will need to modify the image names and remove the registry URL part. + +For example, the image `registry.example.com/longhorn/longhorn-manager:v{{< current-version >}}` is changed to `longhorn/longhorn-manager:v{{< current-version >}}` in Longhorn images section. For more information, see the air gap installation steps [here.](../../../advanced-resources/deploy/airgap/#using-a-rancher-app) + +#### Preparing for the Upgrade + +If Longhorn was installed using a Helm Chart, or if it was installed as Rancher catalog app, check to make sure the parameters in the default StorageClass weren't changed. Changing the default StorageClass's parameter might result in a chart upgrade failure. if you want to reconfigure the parameters in the StorageClass, you can copy the default StorageClass's configuration to create another StorageClass. + + The current default StorageClass has the following parameters: + + parameters: + numberOfReplicas: + staleReplicaTimeout: "30" + fromBackup: "" + baseImage: "" + +#### Upgrade + +> **Prerequisite:** Always back up volumes before upgrading. If anything goes wrong, you can restore the volume using the backup. + +#### Upgrade as a Rancher Catalog App + +To upgrade the Longhorn App, make sure which Rancher UI the existing Longhorn App was installed with. There are two Rancher UIs, one is the Cluster Manager (old UI), and the other one is the Cluster Explorer (new UI). The Longhorn App in different UIs considered as two different applications by Rancher. They cannot upgrade to each other. If you installed Longhorn in the Cluster Manager, you need to use the Cluster Manager to upgrade Longhorn to a newer version, and vice versa for the Cluster Explorer. + +> Note: Because the Cluster Manager (old UI) is being deprecated, we provided the instruction to migrate the existing Longhorn installation to the Longhorn chart in the Cluster Explorer (new UI) [here](https://longhorn.io/kb/how-to-migrate-longhorn-chart-installed-in-old-rancher-ui-to-the-chart-in-new-rancher-ui/) + +Different Rancher UIs screenshots. +- The Cluster Manager (old UI) +{{< figure src="/img/screenshots/install/cluster-manager.png" >}} +- The Cluster Explorer (new UI) +{{< figure src="/img/screenshots/install/cluster-explorer.png" >}} + +#### Upgrade with Kubectl + +To upgrade with kubectl, run this command: + +``` +kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/deploy/longhorn.yaml +``` + +#### Upgrade with Helm + +To upgrade with Helm, run this command: + +``` +helm upgrade longhorn longhorn/longhorn --namespace longhorn-system --version {{< current-version >}} +``` + +On Kubernetes clusters managed by Rancher 2.1 or newer, the steps to upgrade the catalog app `longhorn-system` are the similar to the installation steps. + +Then wait for all the pods to become running and Longhorn UI working. e.g.: + +``` +$ kubectl -n longhorn-system get pod +NAME READY STATUS RESTARTS AGE +engine-image-ei-4dbdb778-nw88l 1/1 Running 0 4m29s +longhorn-ui-b7c844b49-jn5g6 1/1 Running 0 75s +longhorn-manager-z2p8h 1/1 Running 0 71s +instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc 1/1 Running 0 65s +longhorn-driver-deployer-6bd59c9f76-jp6pg 1/1 Running 0 75s +engine-image-ei-df38d2e5-zccq5 1/1 Running 0 65s +csi-snapshotter-588457fcdf-h2lgc 1/1 Running 0 30s +csi-resizer-6d8cf5f99f-8v4sp 1/1 Running 1 (30s ago) 37s +csi-snapshotter-588457fcdf-6pgf4 1/1 Running 0 30s +csi-provisioner-869bdc4b79-7ddwd 1/1 Running 1 (30s ago) 44s +csi-snapshotter-588457fcdf-p4kkn 1/1 Running 0 30s +csi-attacher-7bf4b7f996-mfbdn 1/1 Running 1 (30s ago) 50s +csi-provisioner-869bdc4b79-4dc7n 1/1 Running 1 (30s ago) 43s +csi-resizer-6d8cf5f99f-vnspd 1/1 Running 1 (30s ago) 37s +csi-attacher-7bf4b7f996-hrs7w 1/1 Running 1 (30s ago) 50s +csi-attacher-7bf4b7f996-rt2s9 1/1 Running 1 (30s ago) 50s +csi-resizer-6d8cf5f99f-7vv89 1/1 Running 1 (30s ago) 37s +csi-provisioner-869bdc4b79-sn6zr 1/1 Running 1 (30s ago) 43s +longhorn-csi-plugin-b2zzj 2/2 Running 0 24s +``` + +Next, [upgrade Longhorn engine.](../upgrade-engine) + +### Upgrading from Unsupported Versions + +We only support upgrading to v{{< current-version >}} from v1.4.x. For other versions, please upgrade to v1.4.x first. + +If you attempt to upgrade from an unsupported version, the upgrade will fail. When encountering an upgrade failure, please consider the following scenarios to recover the state based on different upgrade methods. + +#### Upgrade with Kubectl + +When you upgrade with kubectl by running this command: + +```shell +kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/deploy/longhorn.yaml +``` + +Longhorn will block the upgrade process and provide the failure reason in the logs of the `longhorn-manager` pod. +During the upgrade failure, the user's Longhorn system should remain intact without any impacts except `longhorn-manager` daemon set. + +To recover, you need to apply the manifest of the previously installed version using the following command: + +```shell +kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/[previous installed version]/deploy/longhorn.yaml +``` + +Besides, users might need to delete new components introduced by the new version manually. + +#### Upgrade with Helm or Rancher App Marketplace + +To prevent any impact caused by failed upgrades from unsupported versions, Longhorn will automatically initiate a new job (`pre-upgrade`) to verify if the upgrade path is supported before upgrading when upgrading through `Helm` or `Rancher App Marketplace`. + +The `pre-upgrade` job will block the upgrade process and provide the failure reason in the logs of the pod. +During the upgrade failure, the user's Longhorn system should remain intact without any impacts. + +To recover, you need to run the below commands to rollback to the previously installed revision: + +```shell +# get previous installed Longhorn REVISION +helm history longhorn +helm rollback longhorn [REVISION] + +# or +helm upgrade longhorn longhorn/longhorn --namespace longhorn-system --version [previous installed version] +``` + +To recover, you need to upgrade to the previously installed revision at `Rancher App Marketplace` again. + +### TroubleShooting +1. Error: `"longhorn" is invalid: provisioner: Forbidden: updates to provisioner are forbidden.` +- This means there are some modifications applied to the default storageClass and you need to clean up the old one before upgrade. + +- To clean up the deprecated StorageClass, run this command: + ``` + kubectl delete -f https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/examples/storageclass.yaml + ``` diff --git a/content/docs/1.5.1/deploy/upgrade/upgrade-engine.md b/content/docs/1.5.1/deploy/upgrade/upgrade-engine.md new file mode 100644 index 000000000..5ab9d6a5e --- /dev/null +++ b/content/docs/1.5.1/deploy/upgrade/upgrade-engine.md @@ -0,0 +1,42 @@ +--- +title: Manually Upgrading Longhorn Engine +weight: 2 +--- + +In this section, you'll learn how to manually upgrade the Longhorn Engine from the Longhorn UI. + +### Prerequisites + +Always make backups before upgrading the Longhorn engine images. + +Upgrade the Longhorn manager before upgrading the Longhorn engine. + +### Offline Upgrade + +Follow these steps if the live upgrade is not available, or if the volume is stuck in degraded state: + +1. Follow [the detach procedure for relevant workloads](../../../volumes-and-nodes/detaching-volumes). +2. Select all the volumes using batch selection. Click the batch operation button **Upgrade Engine**, and choose the engine image available in the list. It's the default engine shipped with the manager for this release. +3. Resume all workloads. Any volume not part of a Kubernetes workload must be attached from the Longhorn UI. + +### Live upgrade + +Live upgrade is supported for upgrading from v1.4.x to v{{< current-version >}}. + +The `iSCSI` frontend does not support live upgrades. + +Live upgrade should only be done with healthy volumes. + +1. Select the volume you want to upgrade. +2. Click `Upgrade Engine` in the drop down. +3. Select the engine image you want to upgrade to. + 1. Normally it's the only engine image in the list, since the UI exclude the current image from the list. +4. Click OK. + +During the live upgrade, the user will see double number of the replicas temporarily. After upgrade complete, the user should see the same number of the replicas as before, and the `Engine Image` field of the volume should be updated. + +Notice after the live upgrade, Rancher or Kubernetes would still show the old version of image for the engine, and new version for the replicas. It's expected. The upgrade is success if you see the new version of image listed as the volume image in the Volume Detail page. + +### Clean up the old image + +After you've done upgrade for all the images, select `Settings/Engine Image` from Longhorn UI. Now you should able to remove the non-default image. diff --git a/content/docs/1.5.1/high-availability/_index.md b/content/docs/1.5.1/high-availability/_index.md new file mode 100644 index 000000000..808148884 --- /dev/null +++ b/content/docs/1.5.1/high-availability/_index.md @@ -0,0 +1,4 @@ +--- +title: High Availability +weight: 3 +--- \ No newline at end of file diff --git a/content/docs/1.5.1/high-availability/auto-balance-replicas.md b/content/docs/1.5.1/high-availability/auto-balance-replicas.md new file mode 100644 index 000000000..d98d74fcc --- /dev/null +++ b/content/docs/1.5.1/high-availability/auto-balance-replicas.md @@ -0,0 +1,106 @@ +--- + title: Auto Balance Replicas + weight: 1 +--- + +When replicas are scheduled unevenly on nodes or zones, Longhorn `Replica Auto Balance` setting enables the replicas for automatic balancing when a new node is available to the cluster. + +## Replica Auto Balance Settings + +### Global setting +Longhorn supports 3 options for global replica auto-balance setting: + +- `disabled`. This is the default option, no replica auto-balance will be done. + +- `least-effort`. This option instructs Longhorn to balance replicas for minimal redundancy. + For example, after adding node-2, a volume with 4 off-balanced replicas will only rebalance 1 replica. + ``` + node-1 + +-- replica-a + +-- replica-b + +-- replica-c + node-2 + +-- replica-d + ``` + +- `best-effort`. This option instructs Longhorn to try balancing replicas for even redundancy. + For example, after adding node-2, a volume with 4 off-balanced replicas will rebalance 2 replicas. + ``` + node-1 + +-- replica-a + +-- replica-b + node-2 + +-- replica-c + +-- replica-d + ``` + Longhorn does not forcefully re-schedule the replicas to a zone that does not have enough nodes + to support even balance. Instead, Longhorn will re-schedule to balance at the node level. + +### Volume specific setting +Longhorn also supports setting individual volume for `Replica Auto Balance`. The setting can be specified in `volume.spec.replicaAutoBalance`, this overrules the global setting. + +There are 4 options available for individual volume setting: + +- `Ignored`. This is the default option that instructs Longhorn to inherit from the global setting. + +- `disabled`. This option instructs Longhorn no replica auto-balance should be done. + +- `least-effort`. This option instructs Longhorn to balance replicas for minimal redundancy. + For example, after adding node-2, a volume with 4 off-balanced replicas will only rebalance 1 replica. + ``` + node-1 + +-- replica-a + +-- replica-b + +-- replica-c + node-2 + +-- replica-d + ``` + +- `best-effort`. This option instructs Longhorn to try balancing replicas for even redundancy. + For example, after adding node-2, a volume with 4 off-balanced replicas will rebalance 2 replicas. + ``` + node-1 + +-- replica-a + +-- replica-b + node-2 + +-- replica-c + +-- replica-d + ``` + Longhorn does not forcefully re-schedule the replicas to a zone that does not have enough nodes + to support even balance. Instead, Longhorn will re-schedule to balance at the node level. + + +## How to Set Replica Auto Balance For Volumes + +There are 3 ways to set `Replica Auto Balance` for Longhorn volumes: + +### Change the global setting + +You can change the global default setting for `Replica Auto Balance` inside Longhorn UI settings. +The global setting only functions as a default value, similar to the replica count. +It doesn't change any existing volume settings. +When a volume is created without specifying `Replica Auto Balance`, Longhorn will automatically set to `ignored` to inherit from the global setting. + +### Set individual volumes to auto-balance replicas using the Longhorn UI + +You can change the `Replica Auto Balance` setting for individual volume after creation on the volume detail page, or do multiple updates on the listed volume page. + +### Set individual volumes to auto-balance replicas using a StorageClass +Longhorn also exposes the `Replica Auto Balance` setting as a parameter in a StorageClass. +You can create a StorageClass with a specified `Replica Auto Balance` setting, then create PVCs using this StorageClass. + +For example, the below YAML file defines a StorageClass which tells the Longhorn CSI driver to set the `Replica Auto Balance` to `least-effort`: + +```yaml +kind: StorageClass +apiVersion: storage.k8s.io/v1 +metadata: + name: hyper-converged +provisioner: driver.longhorn.io +allowVolumeExpansion: true +parameters: + numberOfReplicas: "3" + replicaAutoBalance: "least-effort" + staleReplicaTimeout: "2880" # 48 hours in minutes + fromBackup: "" +``` diff --git a/content/docs/1.5.1/high-availability/data-locality.md b/content/docs/1.5.1/high-availability/data-locality.md new file mode 100644 index 000000000..ebd487154 --- /dev/null +++ b/content/docs/1.5.1/high-availability/data-locality.md @@ -0,0 +1,59 @@ +--- + title: Data Locality + weight: 1 +--- + +The data locality setting is intended to be enabled in situations where at least one replica of a Longhorn volume should be scheduled on the same node as the pod that uses the volume, whenever it is possible. We refer to the property of having a local replica as having `data locality`. + +For example, data locality can be useful when the cluster's network is bad, because having a local replica increases the availability of the volume. + +Data locality can also be useful for distributed applications (e.g. databases), in which high availability is achieved at the application level instead of the volume level. In that case, only one volume is needed for each pod, so each volume should be scheduled on the same node as the pod that uses it. In addition, the default Longhorn behavior for volume scheduling could cause a problem for distributed applications. The problem is that if there are two replicas of a pod, and each pod replica has one volume each, Longhorn is not aware that those volumes have the same data and should not be scheduled on the same node. Therefore Longhorn could schedule identical replicas on the same node, therefore preventing them from providing high availability for the workload. + +When data locality is disabled, a Longhorn volume can be backed by replicas on any nodes in the cluster and accessed by a pod running on any node in the cluster. + +## Data Locality Settings + +Longhorn currently supports two modes for data locality settings: + +- `disabled`: This is the default option. There may or may not be a replica on the same node as the attached volume (workload). + +- `best-effort`: This option instructs Longhorn to try to keep a replica on the same node as the attached volume (workload). Longhorn will not stop the volume, even if it cannot keep a replica local to the attached volume (workload) due to an environment limitation, e.g. not enough disk space, incompatible disk tags, etc. + +- `strict-local`: This option enforces Longhorn keep the **only one replica** on the same node as the attached volume, and therefore, it offers higher IOPS and lower latency performance. + + +## How to Set Data Locality For Volumes + +There are three ways to set data locality for Longhorn volumes: + +### Change the default global setting + +You can change the global default setting for data locality inside Longhorn UI settings. +The global setting only functions as a default value, similar to the replica count. +It doesn't change any existing volume's settings. +When a volume is created without specifying data locality, Longhorn will use the global default setting to determine data locality for the volume. + +### Change data locality for an individual volume using the Longhorn UI + +You can use Longhorn UI to set data locality for volume upon creation. +You can also change the data locality setting for the volume after creation in the volume detail page. + +### Set the data locality for individual volumes using a StorageClass +Longhorn also exposes the data locality setting as a parameter in a StorageClass. +You can create a StorageClass with a specified data locality setting, then create PVCs using the StorageClass. +For example, the below YAML file defines a StorageClass which tells the Longhorn CSI driver to set the data locality to `best-effort`: + +```yaml +kind: StorageClass +apiVersion: storage.k8s.io/v1 +metadata: + name: hyper-converged +provisioner: driver.longhorn.io +allowVolumeExpansion: true +parameters: + numberOfReplicas: "2" + dataLocality: "best-effort" + staleReplicaTimeout: "2880" # 48 hours in minutes + fromBackup: "" +``` + diff --git a/content/docs/1.5.1/high-availability/k8s-cluster-autoscaler.md b/content/docs/1.5.1/high-availability/k8s-cluster-autoscaler.md new file mode 100644 index 000000000..55ceba28a --- /dev/null +++ b/content/docs/1.5.1/high-availability/k8s-cluster-autoscaler.md @@ -0,0 +1,20 @@ +--- + title: Kubernetes Cluster Autoscaler Support (Experimental) + weight: 1 +--- + +By default, Longhorn blocks Kubernetes Cluster Autoscaler from scaling down nodes because: +- Longhorn creates PodDisruptionBudgets for all engine and replica instance-manager pods. +- Longhorn instance manager pods have strict PodDisruptionBudgets. +- Longhorn instance manager pods are not backed by a Kubernetes built-in workload controller . +- Longhorn pods are using local storage volume mounts. + +For more information, see [What types of pods can prevent CA from removing a node?](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node) + +If you want to unblock the Kubernetes Cluster Autoscaler scaling, you can set the setting [Kubernetes Cluster Autoscaler Enabled](../../references/settings#kubernetes-cluster-autoscaler-enabled-experimental). + +When this setting is enabled, Longhorn will retain the least instance-manager PodDisruptionBudget as possible. Each volume will have at least one replica under the protection of an instance-manager PodDisruptionBudget while no redundant PodDisruptionBudget blocking the Cluster Autoscaler from from scaling down. + +When this setting is enabled, Longhorn will also add `cluster-autoscaler.kubernetes.io/safe-to-evict` annotation to Longhorn workloads that are not backed by a Kubernetes built-in workload controller or are using local storage mounts. + +> **Warning:** Replica rebuilding could be expensive because nodes with reusable replicas could get removed by the Kubernetes Cluster Autoscaler. diff --git a/content/docs/1.5.1/high-availability/node-failure.md b/content/docs/1.5.1/high-availability/node-failure.md new file mode 100644 index 000000000..741b480ab --- /dev/null +++ b/content/docs/1.5.1/high-availability/node-failure.md @@ -0,0 +1,40 @@ +--- +title: Node Failure Handling with Longhorn +weight: 2 +--- + +## What to expect when a Kubernetes Node fails + +This section is aimed to inform users of what happens during a node failure and what is expected during the recovery. + +After **one minute**, `kubectl get nodes` will report `NotReady` for the failure node. + +After about **five minutes**, the states of all the pods on the `NotReady` node will change to either `Unknown` or `NodeLost`. + +StatefulSets have a stable identity, so Kubernetes won't force delete the pod for the user. See the [official Kubernetes documentation about forcing the deletion of a StatefulSet](https://kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/). + +Deployments don't have a stable identity, but for the Read-Write-Once type of storage, since it cannot be attached to two nodes at the same time, the new pod created by Kubernetes won't be able to start due to the RWO volume still attached to the old pod, on the lost node. + +In both cases, Kubernetes will automatically evict the pod (set deletion timestamp for the pod) on the lost node, then try to **recreate a new one with old volumes**. Because the evicted pod gets stuck in `Terminating` state and the attached volumes cannot be released/reused, the new pod will get stuck in `ContainerCreating` state, if there is no intervene from admin or storage software. + +## Longhorn Pod Deletion Policy When Node is Down + +Longhorn provides an option to help users automatically force delete terminating pods of StatefulSet/Deployment on the node that is down. After force deleting, Kubernetes will detach the Longhorn volume and spin up replacement pods on a new node. + +You can find more detail about the setting options in the `Pod Deletion Policy When Node is Down` in the **Settings** tab in the Longhorn UI or [Settings reference](../../references/settings/#pod-deletion-policy-when-node-is-down) + +## What to expect when a failed Kubernetes Node recovers + +If the node is back online within 5 - 6 minutes of the failure, Kubernetes will restart pods, unmount, and re-mount volumes without volume re-attaching and VolumeAttachment cleanup. + +Because the volume engines would be down after the node is down, this direct remount won’t work since the device no longer exists on the node. + +In this case, Longhorn will detach and re-attach the volumes to recover the volume engines, so that the pods can remount/reuse the volumes safely. + +If the node is not back online within 5 - 6 minutes of the failure, Kubernetes will try to delete all unreachable pods based on the pod eviction mechanism and these pods will be in a `Terminating` state. See [pod eviction timeout](https://kubernetes.io/docs/concepts/architecture/nodes/#condition) for details. + +Then if the failed node is recovered later, Kubernetes will restart those terminating pods, detach the volumes, wait for the old VolumeAttachment cleanup, and reuse(re-attach & re-mount) the volumes. Typically these steps may take 1 ~ 7 minutes. + +In this case, detaching and re-attaching operations are already included in the Kubernetes recovery procedures. Hence no extra operation is needed and the Longhorn volumes will be available after the above steps. + +For all above recovery scenarios, Longhorn will handle those steps automatically with the association of Kubernetes. diff --git a/content/docs/1.5.1/high-availability/recover-volume.md b/content/docs/1.5.1/high-availability/recover-volume.md new file mode 100644 index 000000000..0c4a7a2c6 --- /dev/null +++ b/content/docs/1.5.1/high-availability/recover-volume.md @@ -0,0 +1,13 @@ +--- + title: Recover Volume after Unexpected Detachment + weight: 1 +--- + +When an unexpected detachment happens, which can happen during a [Kubernetes upgrade](https://github.com/longhorn/longhorn/issues/703), a [Docker reboot](https://github.com/longhorn/longhorn/issues/686), or a network disconnection, +Longhorn automatically deletes the workload pod if the pod is managed by a controller (e.g. deployment, statefulset, daemonset, etc...). +By deleting the pod, its controller restarts the pod and Kubernetes handles volume reattachment and remount. + +If you don't want Longhorn to automatically delete the workload pod, you can set it in [the setting `Automatically Delete Workload Pod when The Volume Is Detached Unexpectedly`](../../references/settings#automatically-delete-workload-pod-when-the-volume-is-detached-unexpectedly) in Longhorn UI. + +For the pods that don't have a controller, Longhorn doesn't delete them because if Longhorn does, no one will restart them. +To recover unexpectedly detached volumes, you would have to manually delete and recreate the pods that don't have a controller. diff --git a/content/docs/1.5.1/monitoring/_index.md b/content/docs/1.5.1/monitoring/_index.md new file mode 100644 index 000000000..83b80e80a --- /dev/null +++ b/content/docs/1.5.1/monitoring/_index.md @@ -0,0 +1,10 @@ +--- +title: Monitoring +weight: 3 +--- + +* Setting up Prometheus and Grafana to monitor Longhorn +* Integrating Longhorn metrics into the Rancher monitoring system +* Longhorn Metrics for Monitoring +* Support Kubelet Volume Metrics +* Longhorn Alert Rule Examples diff --git a/content/docs/1.5.1/monitoring/alert-rules-example.md b/content/docs/1.5.1/monitoring/alert-rules-example.md new file mode 100644 index 000000000..879f0a8a7 --- /dev/null +++ b/content/docs/1.5.1/monitoring/alert-rules-example.md @@ -0,0 +1,103 @@ +--- +title: Longhorn Alert Rule Examples +weight: 5 +--- + +We provide a couple of example Longhorn alert rules below for your references. +See [here](../metrics) for a list of all available Longhorn metrics and build your own alert rules. + +```yaml +apiVersion: monitoring.coreos.com/v1 +kind: PrometheusRule +metadata: + labels: + prometheus: longhorn + role: alert-rules + name: prometheus-longhorn-rules + namespace: monitoring +spec: + groups: + - name: longhorn.rules + rules: + - alert: LonghornVolumeActualSpaceUsedWarning + annotations: + description: The actual space used by Longhorn volume {{$labels.volume}} on {{$labels.node}} is at {{$value}}% capacity for + more than 5 minutes. + summary: The actual used space of Longhorn volume is over 90% of the capacity. + expr: (longhorn_volume_actual_size_bytes / longhorn_volume_capacity_bytes) * 100 > 90 + for: 5m + labels: + issue: The actual used space of Longhorn volume {{$labels.volume}} on {{$labels.node}} is high. + severity: warning + - alert: LonghornVolumeStatusCritical + annotations: + description: Longhorn volume {{$labels.volume}} on {{$labels.node}} is Fault for + more than 2 minutes. + summary: Longhorn volume {{$labels.volume}} is Fault + expr: longhorn_volume_robustness == 3 + for: 5m + labels: + issue: Longhorn volume {{$labels.volume}} is Fault. + severity: critical + - alert: LonghornVolumeStatusWarning + annotations: + description: Longhorn volume {{$labels.volume}} on {{$labels.node}} is Degraded for + more than 5 minutes. + summary: Longhorn volume {{$labels.volume}} is Degraded + expr: longhorn_volume_robustness == 2 + for: 5m + labels: + issue: Longhorn volume {{$labels.volume}} is Degraded. + severity: warning + - alert: LonghornNodeStorageWarning + annotations: + description: The used storage of node {{$labels.node}} is at {{$value}}% capacity for + more than 5 minutes. + summary: The used storage of node is over 70% of the capacity. + expr: (longhorn_node_storage_usage_bytes / longhorn_node_storage_capacity_bytes) * 100 > 70 + for: 5m + labels: + issue: The used storage of node {{$labels.node}} is high. + severity: warning + - alert: LonghornDiskStorageWarning + annotations: + description: The used storage of disk {{$labels.disk}} on node {{$labels.node}} is at {{$value}}% capacity for + more than 5 minutes. + summary: The used storage of disk is over 70% of the capacity. + expr: (longhorn_disk_usage_bytes / longhorn_disk_capacity_bytes) * 100 > 70 + for: 5m + labels: + issue: The used storage of disk {{$labels.disk}} on node {{$labels.node}} is high. + severity: warning + - alert: LonghornNodeDown + annotations: + description: There are {{$value}} Longhorn nodes which have been offline for more than 5 minutes. + summary: Longhorn nodes is offline + expr: (avg(longhorn_node_count_total) or on() vector(0)) - (count(longhorn_node_status{condition="ready"} == 1) or on() vector(0)) > 0 + for: 5m + labels: + issue: There are {{$value}} Longhorn nodes are offline + severity: critical + - alert: LonghornIntanceManagerCPUUsageWarning + annotations: + description: Longhorn instance manager {{$labels.instance_manager}} on {{$labels.node}} has CPU Usage / CPU request is {{$value}}% for + more than 5 minutes. + summary: Longhorn instance manager {{$labels.instance_manager}} on {{$labels.node}} has CPU Usage / CPU request is over 300%. + expr: (longhorn_instance_manager_cpu_usage_millicpu/longhorn_instance_manager_cpu_requests_millicpu) * 100 > 300 + for: 5m + labels: + issue: Longhorn instance manager {{$labels.instance_manager}} on {{$labels.node}} consumes 3 times the CPU request. + severity: warning + - alert: LonghornNodeCPUUsageWarning + annotations: + description: Longhorn node {{$labels.node}} has CPU Usage / CPU capacity is {{$value}}% for + more than 5 minutes. + summary: Longhorn node {{$labels.node}} experiences high CPU pressure for more than 5m. + expr: (longhorn_node_cpu_usage_millicpu / longhorn_node_cpu_capacity_millicpu) * 100 > 90 + for: 5m + labels: + issue: Longhorn node {{$labels.node}} experiences high CPU pressure. + severity: warning +``` + +See more about how to define alert rules at [here](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/#alerting-rules). diff --git a/content/docs/1.5.1/monitoring/integrating-with-rancher-monitoring.md b/content/docs/1.5.1/monitoring/integrating-with-rancher-monitoring.md new file mode 100644 index 000000000..6e64144b6 --- /dev/null +++ b/content/docs/1.5.1/monitoring/integrating-with-rancher-monitoring.md @@ -0,0 +1,40 @@ +--- +title: Integrating Longhorn metrics into the Rancher monitoring system +weight: 2 +--- +## About the Rancher Monitoring System + +Using Rancher, you can monitor the state and processes of your cluster nodes, Kubernetes components, and software deployments through integration with [Prometheus](https://prometheus.io/), a leading open-source monitoring solution. + +See [here](https://rancher.com/docs/rancher/v2.x/en/monitoring-alerting/) for the instruction about how to deploy/enable the Rancher monitoring system. + +## Add Longhorn Metrics to the Rancher Monitoring System + +If you are using Rancher to manage your Kubernetes and already enabled Rancher monitoring, you can add Longhorn metrics to Rancher monitoring by simply deploying the following ServiceMonitor: + +```yaml +apiVersion: monitoring.coreos.com/v1 +kind: ServiceMonitor +metadata: + name: longhorn-prometheus-servicemonitor + namespace: longhorn-system + labels: + name: longhorn-prometheus-servicemonitor +spec: + selector: + matchLabels: + app: longhorn-manager + namespaceSelector: + matchNames: + - longhorn-system + endpoints: + - port: manager +``` + +Once the ServiceMonitor is created, Rancher will automatically discover all Longhorn metrics. + +You can then set up a Grafana dashboard for visualization. + +You can import our prebuilt [Longhorn example dashboard](https://grafana.com/grafana/dashboards/13032) to have an idea. + +You can also set up alerts in Rancher UI. diff --git a/content/docs/1.5.1/monitoring/kubelet-volume-metrics.md b/content/docs/1.5.1/monitoring/kubelet-volume-metrics.md new file mode 100644 index 000000000..d1e26b38b --- /dev/null +++ b/content/docs/1.5.1/monitoring/kubelet-volume-metrics.md @@ -0,0 +1,31 @@ +--- +title: Kubelet Volume Metrics Support +weight: 4 +--- + +## About Kubelet Volume Metrics + +Kubelet exposes [the following metrics](https://github.com/kubernetes/kubernetes/blob/4b24dca228d61f4d13dcd57b46465b0df74571f6/pkg/kubelet/metrics/collectors/volume_stats.go#L27): + +1. kubelet_volume_stats_capacity_bytes +1. kubelet_volume_stats_available_bytes +1. kubelet_volume_stats_used_bytes +1. kubelet_volume_stats_inodes +1. kubelet_volume_stats_inodes_free +1. kubelet_volume_stats_inodes_used + +Those metrics measure information related to a PVC's filesystem inside a Longhorn block device. + +They are different than [longhorn_volume_*](../metrics) metrics, which measure information specific to a Longhorn block device. + +You can set up a monitoring system that scrapes Kubelet metric endpoints to obtains a PVC's status and set up alerts for abnormal events, such as the PVC being about to run out of storage space. + +A popular monitoring setup is [prometheus-operator/kube-prometheus-stack,](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) which scrapes `kubelet_volume_stats_*` metrics and provides a dashboard and alert rules for them. + +## Longhorn CSI Plugin Support + +In v1.1.0, Longhorn CSI plugin supports the `NodeGetVolumeStats` RPC according to the [CSI spec](https://github.com/container-storage-interface/spec/blob/master/spec.md#nodegetvolumestats). + +This allows the kubelet to query the Longhorn CSI plugin for a PVC's status. + +The kubelet then exposes that information in `kubelet_volume_stats_*` metrics. diff --git a/content/docs/1.5.1/monitoring/metrics.md b/content/docs/1.5.1/monitoring/metrics.md new file mode 100644 index 000000000..462a56074 --- /dev/null +++ b/content/docs/1.5.1/monitoring/metrics.md @@ -0,0 +1,64 @@ +--- +title: Longhorn Metrics for Monitoring +weight: 3 +--- +## Volume + +| Name | Description | Example | +|---|---|---| +| longhorn_volume_actual_size_bytes | Actual space used by each replica of the volume on the corresponding node | longhorn_volume_actual_size_bytes{node="worker-2",volume="testvol"} 1.1917312e+08 | +| longhorn_volume_capacity_bytes | Configured size in bytes for this volume | longhorn_volume_capacity_bytes{node="worker-2",volume="testvol"} 6.442450944e+09 | +| longhorn_volume_state | State of this volume: 1=creating, 2=attached, 3=Detached, 4=Attaching, 5=Detaching, 6=Deleting | longhorn_volume_state{node="worker-2",volume="testvol"} 2 | +| longhorn_volume_robustness | Robustness of this volume: 0=unknown, 1=healthy, 2=degraded, 3=faulted | longhorn_volume_robustness{node="worker-2",volume="testvol"} 1 | +| longhorn_volume_read_throughput | Read throughput of this volume (Bytes/s) | longhorn_volume_read_throughput{node="worker-2",volume="testvol"} 5120000 | +| longhorn_volume_write_throughput | Write throughput of this volume (Bytes/s) | longhorn_volume_write_throughput{node="worker-2",volume="testvol"} 512000 | +| longhorn_volume_read_iops | Read IOPS of this volume | longhorn_volume_read_iops{node="worker-2",volume="testvol"} 100 | +| longhorn_volume_write_iops | Write IOPS of this volume | longhorn_volume_write_iops{node="worker-2",volume="testvol"} 100 | +| longhorn_volume_read_latency | Read latency of this volume (ns) | longhorn_volume_read_latency{node="worker-2",volume="testvol"} 100000 | +| longhorn_volume_write_latency | Write latency of this volume (ns) | longhorn_volume_write_latency{node="worker-2",volume="testvol"} 100000 | + +## Node + +| Name | Description | Example | +|---|---|---| +| longhorn_node_status | Status of this node: 1=true, 0=false | longhorn_node_status{condition="ready",condition_reason="",node="worker-2"} 1 | +| longhorn_node_count_total | Total number of nodes in the Longhorn system | longhorn_node_count_total 4 | +| longhorn_node_cpu_capacity_millicpu | The maximum allocatable CPU on this node | longhorn_node_cpu_capacity_millicpu{node="worker-2"} 2000 | +| longhorn_node_cpu_usage_millicpu | The CPU usage on this node | longhorn_node_cpu_usage_millicpu{node="pworker-2"} 186 | +| longhorn_node_memory_capacity_bytes | The maximum allocatable memory on this node | longhorn_node_memory_capacity_bytes{node="worker-2"} 4.031229952e+09 | +| longhorn_node_memory_usage_bytes | The memory usage on this node | longhorn_node_memory_usage_bytes{node="worker-2"} 1.833582592e+09 | +| longhorn_node_storage_capacity_bytes | The storage capacity of this node | longhorn_node_storage_capacity_bytes{node="worker-3"} 8.3987283968e+10 | +| longhorn_node_storage_usage_bytes | The used storage of this node | longhorn_node_storage_usage_bytes{node="worker-3"} 9.060941824e+09 | +| longhorn_node_storage_reservation_bytes | The reserved storage for other applications and system on this node | longhorn_node_storage_reservation_bytes{node="worker-3"} 2.519618519e+10 | + +## Disk + +| Name | Description | Example | +|---|---|---| +| longhorn_disk_capacity_bytes | The storage capacity of this disk | longhorn_disk_capacity_bytes{disk="default-disk-8b28ee3134628183",node="worker-3"} 8.3987283968e+10 | +| longhorn_disk_usage_bytes | The used storage of this disk | longhorn_disk_usage_bytes{disk="default-disk-8b28ee3134628183",node="worker-3"} 9.060941824e+09 | +| longhorn_disk_reservation_bytes | The reserved storage for other applications and system on this disk | longhorn_disk_reservation_bytes{disk="default-disk-8b28ee3134628183",node="worker-3"} 2.519618519e+10 | + +## Instance Manager + +| Name | Description | Example | +|---|---|---| +| longhorn_instance_manager_cpu_usage_millicpu | The cpu usage of this longhorn instance manager | longhorn_instance_manager_cpu_usage_millicpu{instance_manager="instance-manager-e-2189ed13",instance_manager_type="engine",node="worker-2"} 80 | +| longhorn_instance_manager_cpu_requests_millicpu | Requested CPU resources in kubernetes of this Longhorn instance manager | longhorn_instance_manager_cpu_requests_millicpu{instance_manager="instance-manager-e-2189ed13",instance_manager_type="engine",node="worker-2"} 250 | +| longhorn_instance_manager_memory_usage_bytes | The memory usage of this longhorn instance manager | longhorn_instance_manager_memory_usage_bytes{instance_manager="instance-manager-e-2189ed13",instance_manager_type="engine",node="worker-2"} 2.4072192e+07 | +| longhorn_instance_manager_memory_requests_bytes | Requested memory in Kubernetes of this longhorn instance manager | longhorn_instance_manager_memory_requests_bytes{instance_manager="instance-manager-e-2189ed13",instance_manager_type="engine",node="worker-2"} 0 | +| longhorn_instance_manager_proxy_grpc_connection | The number of proxy gRPC connection of this longhorn instance manager | longhorn_instance_manager_proxy_grpc_connection{instance_manager="instance-manager-e-814dfd05", instance_manager_type="engine", node="worker-2"} 0 + +## Manager + +| Name | Description | Example | +|---|---|---| +| longhorn_manager_cpu_usage_millicpu | The CPU usage of this Longhorn Manager | longhorn_manager_cpu_usage_millicpu{manager="longhorn-manager-5rx2n",node="worker-2"} 27 | +| longhorn_manager_memory_usage_bytes | The memory usage of this Longhorn Manager | longhorn_manager_memory_usage_bytes{manager="longhorn-manager-5rx2n",node="worker-2"} 2.6144768e+07| + +## Backup + +| Name | Description | Example | +|---|---|---| +| longhorn_backup_actual_size_bytes | Actual size of this backup | longhorn_backup_actual_size_bytes{backup="backup-4ab66eca0d60473e",volume="testvol"} 6.291456e+07 | +| longhorn_backup_state | State of this backup: 0=New, 1=Pending, 2=InProgress, 3=Completed, 4=Error, 5=Unknown | longhorn_backup_state{backup="backup-4ab66eca0d60473e",volume="testvol"} 3 | diff --git a/content/docs/1.5.1/monitoring/prometheus-and-grafana-setup.md b/content/docs/1.5.1/monitoring/prometheus-and-grafana-setup.md new file mode 100644 index 000000000..b30cea89d --- /dev/null +++ b/content/docs/1.5.1/monitoring/prometheus-and-grafana-setup.md @@ -0,0 +1,401 @@ +--- +title: Setting up Prometheus and Grafana to monitor Longhorn +weight: 1 +--- + +This document is a quick guide to setting up the monitor for Longhorn. + +Longhorn natively exposes metrics in [Prometheus text format](https://prometheus.io/docs/instrumenting/exposition_formats/#text-based-format) on a REST endpoint `http://LONGHORN_MANAGER_IP:PORT/metrics`. + +You can use any collecting tools such as [Prometheus](https://prometheus.io/), [Graphite](https://graphiteapp.org/), [Telegraf](https://www.influxdata.com/time-series-platform/telegraf/) to scrape these metrics then visualize the collected data by tools such as [Grafana](https://grafana.com/). + +See [Longhorn Metrics for Monitoring](../metrics) for available metrics. + +## High-level Overview + +The monitoring system uses `Prometheus` for collecting data and alerting, and `Grafana` for visualizing/dashboarding the collected data. + +* Prometheus server which scrapes and stores time-series data from Longhorn metrics endpoints. The Prometheus is also responsible for generating alerts based on configured rules and collected data. Prometheus servers then send alerts to an Alertmanager. +* AlertManager then manages those alerts, including silencing, inhibition, aggregation, and sending out notifications via methods such as email, on-call notification systems, and chat platforms. +* Grafana which queries Prometheus server for data and draws a dashboard for visualization. + +The below picture describes the detailed architecture of the monitoring system. + +![images](/img/screenshots/monitoring/longhorn-monitoring-system.png) + +There are 2 unmentioned components in the above picture: + +* Longhorn Backend service is a service pointing to the set of Longhorn manager pods. Longhorn's metrics are exposed in Longhorn manager pods at the endpoint `http://LONGHORN_MANAGER_IP:PORT/metrics`. +* [Prometheus operator](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/user-guides/getting-started.md) makes running Prometheus on top of Kubernetes very easy. The operator watches 3 custom resources: ServiceMonitor, Prometheus ,and AlertManager. + When you create those custom resources, Prometheus Operator deploys and manages the Prometheus server, AlertManager with the user-specified configurations. + +## Installation + +This document uses the `default` namespace for the monitoring system. To install on a different namespace, change the field `namespace: ` in manifests. + +### Install Prometheus Operator +Follow instructions in [Prometheus Operator - Quickstart](https://github.com/prometheus-operator/prometheus-operator#quickstart). + +> **NOTE:** You may need to choose a release that is compatible with the Kubernetes version of the cluster. + +### Install Longhorn ServiceMonitor + +1. Create a ServiceMonitor for Longhorn manager. + + ```yaml + apiVersion: monitoring.coreos.com/v1 + kind: ServiceMonitor + metadata: + name: longhorn-prometheus-servicemonitor + namespace: default + labels: + name: longhorn-prometheus-servicemonitor + spec: + selector: + matchLabels: + app: longhorn-manager + namespaceSelector: + matchNames: + - longhorn-system + endpoints: + - port: manager + ``` + + Longhorn ServiceMonitor has a label selector `app: longhorn-manager` for selecting Longhorn backend service. + + Longhorn ServiceMonitor is included in the Prometheus custom resource so that the Prometheus server can discover all Longhorn manager pods and their endpoints. + +### Install and configure Prometheus AlertManager + +1. Create a highly available Alertmanager deployment with 3 instances. + + ```yaml + apiVersion: monitoring.coreos.com/v1 + kind: Alertmanager + metadata: + name: longhorn + namespace: default + spec: + replicas: 3 + ``` + +1. The Alertmanager instances will not start unless a valid configuration is given. +See [Prometheus - Configuration](https://prometheus.io/docs/alerting/latest/configuration/) for more explanation. + + ```yaml + global: + resolve_timeout: 5m + route: + group_by: [alertname] + receiver: email_and_slack + receivers: + - name: email_and_slack + email_configs: + - to: + from: + smarthost: + # SMTP authentication information. + auth_username: + auth_identity: + auth_password: + headers: + subject: 'Longhorn-Alert' + text: |- + {{ range .Alerts }} + *Alert:* {{ .Annotations.summary }} - `{{ .Labels.severity }}` + *Description:* {{ .Annotations.description }} + *Details:* + {{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}` + {{ end }} + {{ end }} + slack_configs: + - api_url: + channel: + text: |- + {{ range .Alerts }} + *Alert:* {{ .Annotations.summary }} - `{{ .Labels.severity }}` + *Description:* {{ .Annotations.description }} + *Details:* + {{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}` + {{ end }} + {{ end }} + ``` + + Save the above Alertmanager config in a file called `alertmanager.yaml` and create a secret from it using kubectl. + + Alertmanager instances require the secret resource naming to follow the format `alertmanager-`. In the previous step, the name of the Alertmanager is `longhorn`, so the secret name must be `alertmanager-longhorn` + + ``` + $ kubectl create secret generic alertmanager-longhorn --from-file=alertmanager.yaml -n default + ``` + +1. To be able to view the web UI of the Alertmanager, expose it through a Service. A simple way to do this is to use a Service of type NodePort. + + ```yaml + apiVersion: v1 + kind: Service + metadata: + name: alertmanager-longhorn + namespace: default + spec: + type: NodePort + ports: + - name: web + nodePort: 30903 + port: 9093 + protocol: TCP + targetPort: web + selector: + alertmanager: longhorn + ``` + + After creating the above service, you can access the web UI of Alertmanager via a Node's IP and the port 30903. + + > Use the above `NodePort` service for quick verification only because it doesn't communicate over the TLS connection. You may want to change the service type to `ClusterIP` and set up an Ingress-controller to expose the web UI of Alertmanager over a TLS connection. + +### Install and configure Prometheus server + +1. Create PrometheusRule custom resource to define alert conditions. See more examples about Longhorn alert rules at [Longhorn Alert Rule Examples](../alert-rules-example). + + ```yaml + apiVersion: monitoring.coreos.com/v1 + kind: PrometheusRule + metadata: + labels: + prometheus: longhorn + role: alert-rules + name: prometheus-longhorn-rules + namespace: default + spec: + groups: + - name: longhorn.rules + rules: + - alert: LonghornVolumeUsageCritical + annotations: + description: Longhorn volume {{$labels.volume}} on {{$labels.node}} is at {{$value}}% used for + more than 5 minutes. + summary: Longhorn volume capacity is over 90% used. + expr: 100 * (longhorn_volume_usage_bytes / longhorn_volume_capacity_bytes) > 90 + for: 5m + labels: + issue: Longhorn volume {{$labels.volume}} usage on {{$labels.node}} is critical. + severity: critical + ``` + See [Prometheus - Alerting rules](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/#alerting-rules) for more information. + +1. If [RBAC](https://kubernetes.io/docs/reference/access-authn-authz/authorization/) authorization is activated, Create a ClusterRole and ClusterRoleBinding for the Prometheus Pods. + + ```yaml + apiVersion: v1 + kind: ServiceAccount + metadata: + name: prometheus + namespace: default + ``` + + ```yaml + apiVersion: rbac.authorization.k8s.io/v1 + kind: ClusterRole + metadata: + name: prometheus + namespace: default + rules: + - apiGroups: [""] + resources: + - nodes + - services + - endpoints + - pods + verbs: ["get", "list", "watch"] + - apiGroups: [""] + resources: + - configmaps + verbs: ["get"] + - nonResourceURLs: ["/metrics"] + verbs: ["get"] + ``` + + ```yaml + apiVersion: rbac.authorization.k8s.io/v1 + kind: ClusterRoleBinding + metadata: + name: prometheus + roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: prometheus + subjects: + - kind: ServiceAccount + name: prometheus + namespace: default + ``` + +1. Create a Prometheus custom resource. Notice that we select the Longhorn service monitor and Longhorn rules in the spec. + + ```yaml + apiVersion: monitoring.coreos.com/v1 + kind: Prometheus + metadata: + name: longhorn + namespace: default + spec: + replicas: 2 + serviceAccountName: prometheus + alerting: + alertmanagers: + - namespace: default + name: alertmanager-longhorn + port: web + serviceMonitorSelector: + matchLabels: + name: longhorn-prometheus-servicemonitor + ruleSelector: + matchLabels: + prometheus: longhorn + role: alert-rules + ``` + +1. To be able to view the web UI of the Prometheus server, expose it through a Service. A simple way to do this is to use a Service of type NodePort. + + ```yaml + apiVersion: v1 + kind: Service + metadata: + name: prometheus-longhorn + namespace: default + spec: + type: NodePort + ports: + - name: web + nodePort: 30904 + port: 9090 + protocol: TCP + targetPort: web + selector: + prometheus: longhorn + ``` + + After creating the above service, you can access the web UI of the Prometheus server via a Node's IP and the port 30904. + + > At this point, you should be able to see all Longhorn manager targets as well as Longhorn rules in the targets and rules section of the Prometheus server UI. + + > Use the above NodePort service for quick verification only because it doesn't communicate over the TLS connection. You may want to change the service type to `ClusterIP` and set up an Ingress controller to expose the web UI of the Prometheus server over a TLS connection. + +### Setup Grafana + +1. Create Grafana datasource ConfigMap. + + ```yaml + apiVersion: v1 + kind: ConfigMap + metadata: + name: grafana-datasources + namespace: default + data: + prometheus.yaml: |- + { + "apiVersion": 1, + "datasources": [ + { + "access":"proxy", + "editable": true, + "name": "prometheus-longhorn", + "orgId": 1, + "type": "prometheus", + "url": "http://prometheus-longhorn.default.svc:9090", + "version": 1 + } + ] + } + ``` + + > **NOTE:** change field `url` if you are installing the monitoring stack in a different namespace. + > `http://prometheus-longhorn..svc:9090"` + +1. Create Grafana Deployment. + ```yaml + apiVersion: apps/v1 + kind: Deployment + metadata: + name: grafana + namespace: default + labels: + app: grafana + spec: + replicas: 1 + selector: + matchLabels: + app: grafana + template: + metadata: + name: grafana + labels: + app: grafana + spec: + containers: + - name: grafana + image: grafana/grafana:7.1.5 + ports: + - name: grafana + containerPort: 3000 + resources: + limits: + memory: "500Mi" + cpu: "300m" + requests: + memory: "500Mi" + cpu: "200m" + volumeMounts: + - mountPath: /var/lib/grafana + name: grafana-storage + - mountPath: /etc/grafana/provisioning/datasources + name: grafana-datasources + readOnly: false + volumes: + - name: grafana-storage + emptyDir: {} + - name: grafana-datasources + configMap: + defaultMode: 420 + name: grafana-datasources + ``` + +1. Create Grafana Service. + ```yaml + apiVersion: v1 + kind: Service + metadata: + name: grafana + namespace: default + spec: + selector: + app: grafana + type: ClusterIP + ports: + - port: 3000 + targetPort: 3000 + ``` + +1. Expose Grafana on NodePort `32000`. + ```yaml + kubectl -n default patch svc grafana --type='json' -p '[{"op":"replace","path":"/spec/type","value":"NodePort"},{"op":"replace","path":"/spec/ports/0/nodePort","value":32000}]' + ``` + + > Use the above NodePort service for quick verification only because it doesn't communicate over the TLS connection. You may want to change the service type to ClusterIP and set up an Ingress controller to expose Grafana over a TLS connection. + +1. Access the Grafana dashboard using any node IP on port `32000`. + ``` + # Default Credential + User: admin + Pass: admin + ``` + +1. Setup Longhorn dashboard. + + Once inside Grafana, import the prebuilt [Longhorn example dashboard](https://grafana.com/grafana/dashboards/17626). + + See [Grafana Lab - Export and import](https://grafana.com/docs/grafana/latest/reference/export_import/) for instructions on how to import a Grafana dashboard. + + You should see the following dashboard at successful setup: + ![images](/img/screenshots/monitoring/longhorn-example-grafana-dashboard.png) + diff --git a/content/docs/1.5.1/references/_index.md b/content/docs/1.5.1/references/_index.md new file mode 100644 index 000000000..a4d3afdf8 --- /dev/null +++ b/content/docs/1.5.1/references/_index.md @@ -0,0 +1,4 @@ +--- +title: References +weight: 1 +--- diff --git a/content/docs/1.5.1/references/examples.md b/content/docs/1.5.1/references/examples.md new file mode 100644 index 000000000..3be5143b0 --- /dev/null +++ b/content/docs/1.5.1/references/examples.md @@ -0,0 +1,432 @@ +--- +title: Examples +weight: 4 +--- + +For reference, this page provides examples of Kubernetes resources that use Longhorn storage. + +- [Block volume](#block-volume) +- [CSI persistent volume](#csi-persistent-volume) +- [Deployment](#deployment) +- [Pod with PersistentVolumeClaim](#pod-with-persistentvolumeclaim) +- [Restore to file](#restore-to-file) +- [Simple Pod](#simple-pod) +- [Simple PersistentVolumeClaim](#simple-persistentvolumeclaim) +- [StatefulSet](#statefulset) +- [StorageClass](#storageclass) + +### Block Volume + + + apiVersion: v1 + kind: PersistentVolumeClaim + metadata: + name: longhorn-block-vol + spec: + accessModes: + - ReadWriteOnce + volumeMode: Block + storageClassName: longhorn + resources: + requests: + storage: 2Gi + --- + apiVersion: v1 + kind: Pod + metadata: + name: block-volume-test + namespace: default + spec: + containers: + - name: block-volume-test + image: nginx:stable-alpine + imagePullPolicy: IfNotPresent + volumeDevices: + - devicePath: /dev/longhorn/testblk + name: block-vol + ports: + - containerPort: 80 + volumes: + - name: block-vol + persistentVolumeClaim: + claimName: longhorn-block-vol + + + +### CSI Persistent Volume + + apiVersion: v1 + kind: PersistentVolume + metadata: + name: longhorn-vol-pv + spec: + capacity: + storage: 2Gi + volumeMode: Filesystem + accessModes: + - ReadWriteOnce + persistentVolumeReclaimPolicy: Delete + storageClassName: longhorn + csi: + driver: driver.longhorn.io + fsType: ext4 + volumeAttributes: + numberOfReplicas: '3' + staleReplicaTimeout: '2880' + volumeHandle: existing-longhorn-volume + --- + apiVersion: v1 + kind: PersistentVolumeClaim + metadata: + name: longhorn-vol-pvc + spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 2Gi + volumeName: longhorn-vol-pv + storageClassName: longhorn + --- + apiVersion: v1 + kind: Pod + metadata: + name: volume-pv-test + namespace: default + spec: + restartPolicy: Always + containers: + - name: volume-pv-test + image: nginx:stable-alpine + imagePullPolicy: IfNotPresent + livenessProbe: + exec: + command: + - ls + - /data/lost+found + initialDelaySeconds: 5 + periodSeconds: 5 + volumeMounts: + - name: vol + mountPath: /data + ports: + - containerPort: 80 + volumes: + - name: vol + persistentVolumeClaim: + claimName: longhorn-vol-pvc + + +### Deployment + + + apiVersion: v1 + kind: Service + metadata: + name: mysql + labels: + app: mysql + spec: + ports: + - port: 3306 + selector: + app: mysql + clusterIP: None + --- + apiVersion: v1 + kind: PersistentVolumeClaim + metadata: + name: mysql-pvc + spec: + accessModes: + - ReadWriteOnce + storageClassName: longhorn + resources: + requests: + storage: 2Gi + --- + apiVersion: apps/v1 + kind: Deployment + metadata: + name: mysql + labels: + app: mysql + spec: + selector: + matchLabels: + app: mysql # has to match .spec.template.metadata.labels + strategy: + type: Recreate + template: + metadata: + labels: + app: mysql + spec: + restartPolicy: Always + containers: + - image: mysql:5.6 + name: mysql + livenessProbe: + exec: + command: + - ls + - /var/lib/mysql/lost+found + initialDelaySeconds: 5 + periodSeconds: 5 + env: + - name: MYSQL_ROOT_PASSWORD + value: changeme + ports: + - containerPort: 3306 + name: mysql + volumeMounts: + - name: mysql-volume + mountPath: /var/lib/mysql + volumes: + - name: mysql-volume + persistentVolumeClaim: + claimName: mysql-pvc + + +### Pod with PersistentVolumeClaim + + + apiVersion: v1 + kind: PersistentVolumeClaim + metadata: + name: longhorn-volv-pvc + spec: + accessModes: + - ReadWriteOnce + storageClassName: longhorn + resources: + requests: + storage: 2Gi + --- + apiVersion: v1 + kind: Pod + metadata: + name: volume-test + namespace: default + spec: + restartPolicy: Always + containers: + - name: volume-test + image: nginx:stable-alpine + imagePullPolicy: IfNotPresent + livenessProbe: + exec: + command: + - ls + - /data/lost+found + initialDelaySeconds: 5 + periodSeconds: 5 + volumeMounts: + - name: volv + mountPath: /data + ports: + - containerPort: 80 + volumes: + - name: volv + persistentVolumeClaim: + claimName: longhorn-volv-pvc + +### Restore to File + +For more information about restoring to file, refer to [this section.](../../advanced-resources/data-recovery/recover-without-system) + + apiVersion: v1 + kind: Pod + metadata: + name: restore-to-file + namespace: longhorn-system + spec: + nodeName: + containers: + - name: restore-to-file + command: + # set restore-to-file arguments here + - /bin/sh + - -c + - longhorn backup restore-to-file + '' + --output-file '/tmp/restore/' + --output-format + # the version of longhorn engine should be v0.4.1 or higher + image: longhorn/longhorn-engine:v0.4.1 + imagePullPolicy: IfNotPresent + securityContext: + privileged: true + volumeMounts: + - name: disk-directory + mountPath: /tmp/restore # the argument should be in this directory + env: + # set Backup Target Credential Secret here. + - name: AWS_ACCESS_KEY_ID + valueFrom: + secretKeyRef: + name: + key: AWS_ACCESS_KEY_ID + - name: AWS_SECRET_ACCESS_KEY + valueFrom: + secretKeyRef: + name: + key: AWS_SECRET_ACCESS_KEY + - name: AWS_ENDPOINTS + valueFrom: + secretKeyRef: + name: + key: AWS_ENDPOINTS + volumes: + # the output file can be found on this host path + - name: disk-directory + hostPath: + path: /tmp/restore + restartPolicy: Never + + +### Simple Pod + + + apiVersion: v1 + kind: Pod + metadata: + name: longhorn-simple-pod + namespace: default + spec: + restartPolicy: Always + containers: + - name: volume-test + image: nginx:stable-alpine + imagePullPolicy: IfNotPresent + livenessProbe: + exec: + command: + - ls + - /data/lost+found + initialDelaySeconds: 5 + periodSeconds: 5 + volumeMounts: + - name: volv + mountPath: /data + ports: + - containerPort: 80 + volumes: + - name: volv + persistentVolumeClaim: + claimName: longhorn-simple-pvc + + + +### Simple PersistentVolumeClaim + + + apiVersion: v1 + kind: PersistentVolumeClaim + metadata: + name: longhorn-simple-pvc + spec: + accessModes: + - ReadWriteOnce + storageClassName: longhorn + resources: + requests: + storage: 1Gi + + + +### StatefulSet + + + apiVersion: v1 + kind: Service + metadata: + name: nginx + labels: + app: nginx + spec: + ports: + - port: 80 + name: web + selector: + app: nginx + type: NodePort + --- + apiVersion: apps/v1 + kind: StatefulSet + metadata: + name: web + spec: + selector: + matchLabels: + app: nginx # has to match .spec.template.metadata.labels + serviceName: "nginx" + replicas: 2 # by default is 1 + template: + metadata: + labels: + app: nginx # has to match .spec.selector.matchLabels + spec: + restartPolicy: Always + terminationGracePeriodSeconds: 10 + containers: + - name: nginx + image: registry.k8s.io/nginx-slim:0.8 + livenessProbe: + exec: + command: + - ls + - /usr/share/nginx/html/lost+found + initialDelaySeconds: 5 + periodSeconds: 5 + ports: + - containerPort: 80 + name: web + volumeMounts: + - name: www + mountPath: /usr/share/nginx/html + volumeClaimTemplates: + - metadata: + name: www + spec: + accessModes: [ "ReadWriteOnce" ] + storageClassName: "longhorn" + resources: + requests: + storage: 1Gi + +### StorageClass + + kind: StorageClass + apiVersion: storage.k8s.io/v1 + metadata: + name: longhorn + provisioner: driver.longhorn.io + allowVolumeExpansion: true + reclaimPolicy: Delete + volumeBindingMode: Immediate + parameters: + numberOfReplicas: "3" + staleReplicaTimeout: "2880" # 48 hours in minutes + fromBackup: "" + fsType: "ext4" + # mkfsParams: "-I 256 -b 4096 -O ^metadata_csum,^64bit" + # backingImage: "bi-test" + # backingImageDataSourceType: "download" + # backingImageDataSourceParameters: '{"url": "https://backing-image-example.s3-region.amazonaws.com/test-backing-image"}' + # backingImageChecksum: "SHA512 checksum of the backing image" + # diskSelector: "ssd,fast" + # nodeSelector: "storage,fast" + # recurringJobSelector: '[ + # { + # "name":"snap", + # "isGroup":true, + # }, + # { + # "name":"backup", + # "isGroup":false, + # } + # ]' + +Note that Longhorn supports automatic remount only for the workload pod that is managed by a controller (e.g. deployment, statefulset, daemonset, etc...). +See [here](../../high-availability/recover-volume/) for details. diff --git a/content/docs/1.5.1/references/longhorn-client-python.md b/content/docs/1.5.1/references/longhorn-client-python.md new file mode 100644 index 000000000..58450beb1 --- /dev/null +++ b/content/docs/1.5.1/references/longhorn-client-python.md @@ -0,0 +1,75 @@ +--- +title: Python Client +weight: 2 +--- + +Currently, you can operate Longhorn using Longhorn UI. +We are planning to build a dedicated Longhorn CLI in the upcoming releases. + +In the meantime, you can access Longhorn API using Python binding, as we demonstrated below. + +1. Get Longhorn endpoint + + One way to communicate with Longhorn is through `longhorn-frontend` service. + + If you run your automation/scripting tool inside the same cluster in which Longhorn is installed, connect to the endpoint `http://longhorn-frontend.longhorn-system/v1` + + + If you run your automation/scripting tool on your local machine, + use `kubectl port-forward` to forward the `longhorn-frontend` service to localhost: + ``` + kubectl port-forward services/longhorn-frontend 8080:http -n longhorn-system + ``` + and connect to endpoint `http://localhost:8080/v1` + +2. Using Python Client + + Import file [longhorn.py](https://github.com/longhorn/longhorn-tests/blob/master/manager/integration/tests/longhorn.py) which contains the Python client into your Python script and create a client from the endpoint: + ```python + import longhorn + + # If automation/scripting tool is inside the same cluster in which Longhorn is installed + longhorn_url = 'http://longhorn-frontend.longhorn-system/v1' + # If forwarding `longhorn-frontend` service to localhost + longhorn_url = 'http://localhost:8080/v1' + + client = longhorn.Client(url=longhorn_url) + + # Volume operations + # List all volumes + volumes = client.list_volume() + # Get volume by NAME/ID + testvol1 = client.by_id_volume(id="testvol1") + # Attach TESTVOL1 + testvol1 = testvol1.attach(hostId="worker-1") + # Detach TESTVOL1 + testvol1.detach() + # Create a snapshot of TESTVOL1 with NAME + snapshot1 = testvol1.snapshotCreate(name="snapshot1") + # Create a backup from a snapshot NAME + testvol1.snapshotBackup(name=snapshot1.name) + # Update the number of replicas of TESTVOL1 + testvol1.updateReplicaCount(replicaCount=2) + # Find more examples in Longhorn integration tests https://github.com/longhorn/longhorn-tests/tree/master/manager/integration/tests + + # Node operations + # List all nodes + nodes = client.list_node() + # Get node by NAME/ID + node1 = client.by_id_node(id="worker-1") + # Disable scheduling for NODE1 + client.update(node1, allowScheduling=False) + # Enable scheduling for NODE1 + client.update(node1, allowScheduling=True) + # Find more examples in Longhorn integration tests https://github.com/longhorn/longhorn-tests/tree/master/manager/integration/tests + + # Setting operations + # List all settings + settings = client.list_setting() + # Get setting by NAME/ID + backupTargetsetting = client.by_id_setting(id="backup-target") + # Update a setting + backupTargetsetting = client.update(backupTargetsetting, value="s3://backupbucket@us-east-1/") + # Find more examples in Longhorn integration tests https://github.com/longhorn/longhorn-tests/tree/master/manager/integration/tests + ``` + diff --git a/content/docs/1.5.1/references/networking.md b/content/docs/1.5.1/references/networking.md new file mode 100644 index 000000000..9ff2bc93b --- /dev/null +++ b/content/docs/1.5.1/references/networking.md @@ -0,0 +1,181 @@ +--- +title: Longhorn Networking +weight: 3 +--- + +### Overview + +This page documents the networking communication between components in the Longhorn system. Using this information, users can write Kubernetes NetworkPolicy +to control the inbound/outbound traffic to/from Longhorn components. This helps to reduce the damage when a malicious pod breaks into the in-cluster network. + +We have provided some NetworkPolicy example yamls at [here](https://github.com/longhorn/longhorn/tree/master/examples/network-policy). +Or you can enable the setting in the helm chart to install these NetworkPolicy [https://github.com/longhorn/longhorn/blob/master/chart/values.yaml] +Note that depending on the deployed [CNI](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/), not all Kubernetes clusters support NetworkPolicy. +See [here](https://kubernetes.io/docs/concepts/services-networking/network-policies/) for more detail. + +> Note: If you are writing network policies, please revisit this page before upgrading Longhorn to make the necessary adjustments to your network policies. +> Note: Depending on your CNI for cluster network, there might be some delay when Kubernetes applying netowk policies to the pod. This delay may fail Longhorn recurring job for taking Snapshot or Backup of the Volume since it cannot access longhorn-manager in the beginning. This is a known issue found in K3s with Traefik and is beyond Longhorn control. + +### Longhorn Manager +#### Ingress: +From | Port | Protocol +--- | --- | --- +`Other Longhorn Manager` | 9500 | TCP +`UI` | 9500 | TCP +`Longhorn CSI plugin` | 9500 | TCP +`Backup/Snapshot Recurring Job Pod` | 9500 | TCP +`Longhorn Driver Deployer` | 9500 | TCP + +#### Egress: +To | Port | Protocol +--- | --- | --- +`Other Longhorn Manager` | 9500 | TCP +`Instance Manager` | 8500; 8501 | TCP +`Backing Image Manager` | 8000 | TCP +`Backing Image Data Source` | 8000 | TCP +`External Backupstore` | User defined | TCP +`Kubernetes API server` | `Kubernetes API server port` | TCP + +### UI +#### ingress: +Users defined +#### egress: +To | Port | Protocol +--- | --- | --- +`Longhorn Manager` | 9500 | TCP + +### Instance Manager +#### ingress +From | Port | Protocol +--- | --- | --- +`Longhorn Manager` | 8500; 8501 | TCP +`Other Instance Manager` | 10000-30000 | TCP +`Node in the Cluster` | 3260 | TCP +`Backing Image Data Source` | 10000-30000 | TCP + +#### egress: +To | Port | Protocol +--- | --- | --- +`Other Instance Manager` | 10000-30000 | TCP +`Backing Image Data Source` | 8002 | TCP +`External Backupstore` | User defined | TCP + +### Longhorn CSI plugin +#### ingress +None + +#### egress: +To | Port | Protocol +--- | --- | --- +`Longhorn Manager` | 9500 | TCP + +#### Additional Info +`Longhorn CSI plugin` pods communitate with `CSI sidecar` pods over the Unix Domain Socket at `/plugins/driver.longhorn.io/csi.sock` + + +### CSI sidecar (csi-attacher, csi-provisioner, csi-resizer, csi-snapshotter) +#### ingress: +None +#### egress: +To | Port | Protocol +--- | --- | --- +`Kubernetes API server` | `Kubernetes API server port` | TCP + +#### Additional Info +`CSI sidecar` pods communitate with `Longhorn CSI plugin` pods over the Unix Domain Socket at `/plugins/driver.longhorn.io/csi.sock` + +### Driver deployer +#### ingress: +None +#### egress: +To | Port | Protocol +--- | --- | --- +`Longhorn Manager` | 9500 | TCP +`Kubernetes API server` | `Kubernetes API server port` | TCP + +### Conversion Webhook Server +#### ingress: +From | Port | Protocol +--- | --- | --- +`Webhook Server` | 9501 | TCP + +### Admission Webhook Server +#### ingress: +From | Port | Protocol +--- | --- | --- +`Webhook Server` | 9502 | TCP + +### NFS Recovery Backend Server +#### ingress: +From | Port | Protocol +--- | --- | --- +`Recovery Backend Server` | 9503 | TCP + +### Engine Image +#### ingress: +None +#### egress: +None + +### Backing Image Manager +#### ingress: +From | Port | Protocol +--- | --- | --- +`Longhorn Manager` | 8000 | TCP +`Other Backing Image Manager` | 30001-31000 | TCP + +#### egress: +To | Port | Protocol +--- | --- | --- +`Instance Manager` | 10000-30000 | TCP +`Other Backing Image Manager` | 30001-31000 | TCP +`Backing Image Data Source` | 8000 | TCP + +### Backing Image Data Source +#### ingress: +From | Port | Protocol +--- | --- | --- +`Longhorn Manager` | 8000 | TCP +`Instance Manager` | 8002 | TCP +`Backing Image Manager` | 8000 | TCP + +#### egress: +To | Port | Protocol +--- | --- | --- +`Instance Manager` | 10000-30000 | TCP +`User provided server IP to download the images from` | user defined | TCP + +### Share Manager +#### ingress +From | Port | Protocol +--- | --- | --- +`Node in the cluster` | 2049 | TCP + +#### egress: +None + +### Backup/Snapshot Recurring Job Pod +#### ingress: +None +#### egress: +To | Port | Protocol +--- | --- | --- +`Longhorn Manager` | 9500 | TCP + +### Uninstaller +#### ingress: +None +#### egress: +To | Port | Protocol +--- | --- | --- +`Kubernetes API server` | `Kubernetes API server port` | TCP + +### Discover Proc Kubelet Cmdline +#### ingress: +None +#### egress: +None + +--- +Original GitHub issue: +https://github.com/longhorn/longhorn/issues/1805 diff --git a/content/docs/1.5.1/references/settings.md b/content/docs/1.5.1/references/settings.md new file mode 100644 index 000000000..8767c1e53 --- /dev/null +++ b/content/docs/1.5.1/references/settings.md @@ -0,0 +1,753 @@ +--- +title: Settings Reference +weight: 1 +--- + +- [Customizing Default Settings](#customizing-default-settings) +- [General](#general) + - [Node Drain Policy](#node-drain-policy) + - [Automatically Cleanup System Generated Snapshot](#automatically-cleanup-system-generated-snapshot) + - [Automatically Delete Workload Pod when The Volume Is Detached Unexpectedly](#automatically-delete-workload-pod-when-the-volume-is-detached-unexpectedly) + - [Automatic Salvage](#automatic-salvage) + - [Concurrent Automatic Engine Upgrade Per Node Limit](#concurrent-automatic-engine-upgrade-per-node-limit) + - [Concurrent Volume Backup Restore Per Node Limit](#concurrent-volume-backup-restore-per-node-limit) + - [Create Default Disk on Labeled Nodes](#create-default-disk-on-labeled-nodes) + - [Custom Resource API Version](#custom-resource-api-version) + - [Default Data Locality](#default-data-locality) + - [Default Data Path](#default-data-path) + - [Default Engine Image](#default-engine-image) + - [Default Longhorn Static StorageClass Name](#default-longhorn-static-storageclass-name) + - [Default Replica Count](#default-replica-count) + - [Deleting Confirmation Flag](#deleting-confirmation-flag) + - [Disable Revision Counter](#disable-revision-counter) + - [Enable Upgrade Checker](#enable-upgrade-checker) + - [Latest Longhorn Version](#latest-longhorn-version) + - [Allow Collecting Longhorn Usage Metrics](#allow-collecting-longhorn-usage-metrics) + - [Pod Deletion Policy When Node is Down](#pod-deletion-policy-when-node-is-down) + - [Registry Secret](#registry-secret) + - [Replica Replenishment Wait Interval](#replica-replenishment-wait-interval) + - [System Managed Pod Image Pull Policy](#system-managed-pod-image-pull-policy) + - [Backing Image Cleanup Wait Interval](#backing-image-cleanup-wait-interval) + - [Backing Image Recovery Wait Interval](#backing-image-recovery-wait-interval) + - [Engine to Replica Timeout](#engine-to-replica-timeout) + - [Support Bundle Manager Image](#support-bundle-manager-image) + - [Support Bundle Failed History Limit](#support-bundle-failed-history-limit) + - [Fast Replica Rebuild Enabled](#fast-replica-rebuild-enabled) + - [Timeout of HTTP Client to Replica File Sync Server](#timeout-of-http-client-to-replica-file-sync-server) +- [V2 Data Engine (Preview Feature)](#v2-data-engine-preview-feature) + - [V2 Data Engine](#v2-data-engine) + - [Offline Replica Rebuilding](#offline-replica-rebuilding) +- [Snapshot](#snapshot) + - [Snapshot Data Integrity](#snapshot-data-integrity) + - [Immediate Snapshot Data Integrity Check After Creating a Snapshot](#immediate-snapshot-data-integrity-check-after-creating-a-snapshot) + - [Snapshot Data Integrity Check CronJob](#snapshot-data-integrity-check-cronjob) +- [Orphan](#orphan) + - [Orphaned Data Automatic Deletion](#orphaned-data-automatic-deletion) +- [Backups](#backups) + - [Allow Recurring Job While Volume Is Detached](#allow-recurring-job-while-volume-is-detached) + - [Backup Target](#backup-target) + - [Backup Target Credential Secret](#backup-target-credential-secret) + - [Backupstore Poll Interval](#backupstore-poll-interval) + - [Failed Backup Time To Live](#failed-backup-time-to-live) + - [Cronjob Failed Jobs History Limit](#cronjob-failed-jobs-history-limit) + - [Cronjob Successful Jobs History Limit](#cronjob-successful-jobs-history-limit) + - [Restore Volume Recurring Jobs](#restore-volume-recurring-jobs) + - [Backup Compression Method](#backup-compression-method) + - [Backup Concurrent Limit Per Backup](#backup-concurrent-limit-per-backup) + - [Restore Concurrent Limit Per Backup](#restore-concurrent-limit-per-backup) +- [Scheduling](#scheduling) + - [Allow Volume Creation with Degraded Availability](#allow-volume-creation-with-degraded-availability) + - [Disable Scheduling On Cordoned Node](#disable-scheduling-on-cordoned-node) + - [Replica Node Level Soft Anti-Affinity](#replica-node-level-soft-anti-affinity) + - [Replica Zone Level Soft Anti-Affinity](#replica-zone-level-soft-anti-affinity) + - [Replica Auto Balance](#replica-auto-balance) + - [Storage Minimal Available Percentage](#storage-minimal-available-percentage) + - [Storage Over Provisioning Percentage](#storage-over-provisioning-percentage) +- [Danger Zone](#danger-zone) + - [Concurrent Replica Rebuild Per Node Limit](#concurrent-replica-rebuild-per-node-limit) + - [Kubernetes Taint Toleration](#kubernetes-taint-toleration) + - [Priority Class](#priority-class) + - [System Managed Components Node Selector](#system-managed-components-node-selector) + - [Kubernetes Cluster Autoscaler Enabled (Experimental)](#kubernetes-cluster-autoscaler-enabled-experimental) + - [Storage Network](#storage-network) + - [Remove Snapshots During Filesystem Trim](#remove-snapshots-during-filesystem-trim) + - [Guaranteed Instance Manager CPU](#guaranteed-instance-manager-cpu) + + +### Customizing Default Settings + +To configure Longhorn before installing it, see [this section](../../advanced-resources/deploy/customizing-default-settings) for details. + +### General + +#### Node Drain Policy + +> Default: `block-if-contains-last-replica` + +Define the policy to use when a node with the last healthy replica of a volume is drained. Available options: +- `block-if-contains-last-replica`: Longhorn will block the drain when the node contains the last healthy replica of a volume. +- `allow-if-replica-is-stopped`: Longhorn will allow the drain when the node contains the last healthy replica of a volume but the replica is stopped. + WARNING: possible data loss if the node is removed after draining. Select this option if you want to drain the node and do in-place upgrade/maintenance. +- `always-allow`: Longhorn will allow the drain even though the node contains the last healthy replica of a volume. + WARNING: possible data loss if the node is removed after draining. Also possible data corruption if the last replica was running during the draining. + +#### Automatically Cleanup System Generated Snapshot + +> Default: `true` + +Longhorn will generate system snapshot during replica rebuild, and if a user doesn't setup a recurring snapshot schedule, all the system generated snapshots would be left in the replica, and user has to delete them manually, this setting allow Longhorn to automatically cleanup system generated snapshot before and after replica rebuild. + +#### Automatically Delete Workload Pod when The Volume Is Detached Unexpectedly + +> Default: `true` + +If enabled, Longhorn will automatically delete the workload pod that is managed by a controller (e.g. deployment, statefulset, daemonset, etc...) when Longhorn volume is detached unexpectedly (e.g. during Kubernetes upgrade, Docker reboot, or network disconnect). +By deleting the pod, its controller restarts the pod and Kubernetes handles volume reattachment and remount. + +If disabled, Longhorn will not delete the workload pod that is managed by a controller. You will have to manually restart the pod to reattach and remount the volume. + +> **Note:** This setting doesn't apply to below cases. +> - The workload pods don't have a controller; Longhorn never deletes them. +> - The volumes used by workloads are RWX, because the Longhorn share manager, which provides the RWX NFS service, has its own resilience mechanism to ensure availability until the volume gets reattached without relying on the pod lifecycle to trigger volume reattachment. For details, see [here](../../advanced-resources/rwx-workloads). + +#### Automatic Salvage + +> Default: `true` + +If enabled, volumes will be automatically salvaged when all the replicas become faulty e.g. due to network disconnection. Longhorn will try to figure out which replica(s) are usable, then use them for the volume. + +#### Concurrent Automatic Engine Upgrade Per Node Limit + +> Default: `0` + +This setting controls how Longhorn automatically upgrades volumes' engines to the new default engine image after upgrading Longhorn manager. +The value of this setting specifies the maximum number of engines per node that are allowed to upgrade to the default engine image at the same time. +If the value is 0, Longhorn will not automatically upgrade volumes' engines to default version. + +#### Concurrent Volume Backup Restore Per Node Limit + +> Default: `5` + +This setting controls how many volumes on a node can restore the backup concurrently. + +Longhorn blocks the backup restore once the restoring volume count exceeds the limit. + +Set the value to **0** to disable backup restore. + +#### Create Default Disk on Labeled Nodes + +> Default: `false` + +If no other disks exist, create the default disk automatically, only on nodes with the Kubernetes label `node.longhorn.io/create-default-disk=true` . + +If disabled, the default disk will be created on all new nodes when the node is detected for the first time. + +This option is useful if you want to scale the cluster but don't want to use storage on the new nodes, or if you want to [customize disks for Longhorn nodes](../../advanced-resources/default-disk-and-node-config). + +#### Custom Resource API Version + +> Default: `longhorn.io/v1beta2` + +The current customer resource's API version, e.g. longhorn.io/v1beta2. Set by manager automatically. + +#### Default Data Locality + +> Default: `disabled` + +We say a Longhorn volume has data locality if there is a local replica of the volume on the same node as the pod which is using the volume. +This setting specifies the default data locality when a volume is created from the Longhorn UI. For Kubernetes configuration, update the dataLocality in the StorageClass + +The available modes are: + +- `disabled`. This is the default option. + There may or may not be a replica on the same node as the attached volume (workload). + +- `best-effort`. This option instructs Longhorn to try to keep a replica on the same node as the attached volume (workload). + Longhorn will not stop the volume, even if it cannot keep a replica local to the attached volume (workload) due to environment limitation, e.g. not enough disk space, incompatible disk tags, etc. + +- `strict-local`: This option enforces Longhorn keep the **only one replica** on the same node as the attached volume, and therefore, it offers higher IOPS and lower latency performance. + + +#### Default Data Path + +> Default: `/var/lib/longhorn/` + +Default path to use for storing data on a host. + +Can be used with `Create Default Disk on Labeled Nodes` option, to make Longhorn only use the nodes with specific storage mounted at, for example, `/opt/longhorn` when scaling the cluster. + +#### Default Engine Image + +The default engine image used by the manager. Can be changed on the manager starting command line only. + +Every Longhorn release will ship with a new Longhorn engine image. If the current Longhorn volumes are not using the default engine, a green arrow will show up, indicate this volume needs to be upgraded to use the default engine. + +#### Default Longhorn Static StorageClass Name + +> Default: `longhorn-static` + +The `storageClassName` is for persistent volumes (PVs) and persistent volume claims (PVCs) when creating PV/PVC for an existing Longhorn volume. Notice that it's unnecessary for users to create the related StorageClass object in Kubernetes since the StorageClass would only be used as matching labels for PVC bounding purpose. By default 'longhorn-static'. + +#### Default Replica Count + +> Default: `3` + +The default number of replicas when creating the volume from Longhorn UI. For Kubernetes, update the `numberOfReplicas` in the StorageClass + +The recommended way of choosing the default replica count is: if you have three or more nodes for storage, use 3; otherwise use 2. Using a single replica on a single node cluster is also OK, but the high availability functionality wouldn't be available. You can still take snapshots/backups of the volume. + +#### Deleting Confirmation Flag +This flag protects Longhorn from unexpected uninstallation which leads to data loss. +Set this flag to **true** to allow Longhorn uninstallation. +If this flag is **false**, the Longhorn uninstallation job will fail. + +> Default: `false` + +#### Disable Revision Counter + +> Default: `false` + +Allows engine controller and engine replica to disable revision counter file update for every data write. This improves the data path performance. See [Revision Counter](../../advanced-resources/deploy/revision_counter) for details. + +#### Enable Upgrade Checker + +> Default: `true` + +Upgrade Checker will check for a new Longhorn version periodically. When there is a new version available, it will notify the user in the Longhorn UI. + +#### Latest Longhorn Version + +The latest version of Longhorn available. Automatically updated by the Upgrade Checker. + +> Only available if `Upgrade Checker` is enabled. + +#### Allow Collecting Longhorn Usage Metrics + +> Default: `true` + +Enabling this setting will allow Longhorn to provide valuable usage metrics to https://metrics.longhorn.io/. + +This information will help us gain insights how Longhorn is being used, which will ultimately contribute to future improvements. + +**Node Information collected from all cluster nodes includes:** +- Number of disks of each device type (HDD, SSD, NVMe, unknown). + > This value may not be accurate for virtual machines. +- Host kernel release. +- Host operating system (OS) distribution. +- Kubernetest node provider. + +**Cluster Information collected from one of the cluster nodes includes:** +- Longhorn namespace UID. +- Number of Longhorn nodes. +- Number of volumes of each access mode (RWO, RWX, unknown). +- Number of volumes of each data locality type (disabled, best_effort, strict_local, unknown). +- Number of volumes of each frontend type (blockdev, iscsi). +- Average volume size in bytes. +- Average volume actual size in bytes. +- Average number of snapshots per volume. +- Average number of replicas per volume. +- Average Longhorn component CPU usage (instance manager, manager) in millicores. +- Average Longhorn component memory usage (instance manager, manager) in bytes. +- Longhorn settings: + - Partially included: + - Backup Target Type/Protocol (azblob, cifs, nfs, s3, none, unknown). This is from the Backup Target setting. + - Included as true or false to indicate if this setting is configured: + - Priority Class + - Registry Secret + - Storage Network + - System Managed Components Node Selector + - Taint Toleration + - Included as it is: + - Allow Recurring Job While Volume Is Detached + - Allow Volume Creation With Degraded Availability + - Automatically Cleanup System Generated Snapshot + - Automatically Delete Workload Pod when The Volume Is Detached Unexpectedly + - Automatic Salvage + - Backing Image Cleanup Wait Interval + - Backing Image Recovery Wait Interval + - Backup Compression Method + - Backupstore Poll Interval + - Backup Concurrent Limit + - Concurrent Automatic Engine Upgrade Per Node Limit + - Concurrent Backup Restore Per Node Limit + - Concurrent Replica Rebuild Per Node Limit + - CRD API Version + - Create Default Disk Labeled Nodes + - Default Data Locality + - Default Replica Count + - Disable Revision Counter + - Disable Scheduling On Cordoned Node + - Engine Replica Timeout + - Failed Backup TTL + - Fast Replica Rebuild Enabled + - Guaranteed Instance Manager CPU + - Kubernetes Cluster Autoscaler Enabled + - Node Down Pod Deletion Policy + - Node Drain Policy + - Orphan Auto Deletion + - Recurring Failed Jobs History Limit + - Recurring Successful Jobs History Limit + - Remove Snapshots During Filesystem Trim + - Replica Auto Balance + - Replica File Sync HTTP Client Timeout + - Replica Replenishment Wait Interval + - Replica Soft Anti Affinity + - Replica Zone Soft Anti Affinity + - Restore Concurrent Limit + - Restore Volume Recurring Jobs + - Snapshot Data Integrity + - Snapshot Data Integrity CronJob + - Snapshot DataIntegrity Immediate Check After Snapshot Creation + - Storage Minimal Available Percentage + - Storage Over Provisioning Percentage + - Storage Reserved Percentage For Default Disk + - Support Bundle Failed History Limit + - System Managed Pods Image Pull Policy + +> The `Upgrade Checker` needs to be enabled to periodically send the collected data. + +#### Pod Deletion Policy When Node is Down + +> Default: `do-nothing` + +Defines the Longhorn action when a Volume is stuck with a StatefulSet/Deployment Pod on a node that is down. + +- `do-nothing` is the default Kubernetes behavior of never force deleting StatefulSet/Deployment terminating pods. Since the pod on the node that is down isn't removed, Longhorn volumes are stuck on nodes that are down. +- `delete-statefulset-pod` Longhorn will force delete StatefulSet terminating pods on nodes that are down to release Longhorn volumes so that Kubernetes can spin up replacement pods. +- `delete-deployment-pod` Longhorn will force delete Deployment terminating pods on nodes that are down to release Longhorn volumes so that Kubernetes can spin up replacement pods. +- `delete-both-statefulset-and-deployment-pod` Longhorn will force delete StatefulSet/Deployment terminating pods on nodes that are down to release Longhorn volumes so that Kubernetes can spin up replacement pods. + +#### Registry Secret + +The Kubernetes Secret name. + +#### Replica Replenishment Wait Interval + +> Default: `600` + +When there is at least one failed replica volume in a degraded volume, this interval in seconds determines how long Longhorn will wait at most in order to reuse the existing data of the failed replicas rather than directly creating a new replica for this volume. + +Warning: This wait interval works only when there is at least one failed replica in the volume. And this option may block the rebuilding for a while. + +#### System Managed Pod Image Pull Policy + +> Default: `if-not-present` + +This setting defines the Image Pull Policy of Longhorn system managed pods, e.g. instance manager, engine image, CSI driver, etc. + +Notice that the new Image Pull Policy will only apply after the system managed pods restart. + +This setting definition is exactly the same as that of in Kubernetes. Here are the available options: + +- `always`. Every time the kubelet launches a container, the kubelet queries the container image registry to resolve the name to an image digest. If the kubelet has a container image with that exact digest cached locally, the kubelet uses its cached image; otherwise, the kubelet downloads (pulls) the image with the resolved digest, and uses that image to launch the container. + +- `if-not-present`. The image is pulled only if it is not already present locally. + +- `never`. The image is assumed to exist locally. No attempt is made to pull the image. + + +#### Backing Image Cleanup Wait Interval +> Default: `60` + +This interval in minutes determines how long Longhorn will wait before cleaning up the backing image file when there is no replica in the disk using it. + +#### Backing Image Recovery Wait Interval +> Default: `300` + +The interval in seconds determines how long Longhorn will wait before re-downloading the backing image file when all disk files of this backing image become `failed` or `unknown`. + +> **Note:** +> - This recovery only works for the backing image of which the creation type is `download`. +> - File state `unknown` means the related manager pods on the pod is not running or the node itself is down/disconnected. + +#### Engine to Replica Timeout +> Default: `8` + +The value in seconds specifies the timeout of the engine to the replica(s), and the value should be between 8 to 30 seconds. + +#### Support Bundle Manager Image + +Longhorn uses the support bundle manager image to generate the support bundles. + +There will be a default image given during installation and upgrade. You can also change it in the settings. + +An example of the support bundle manager image: +> Default: `longhornio/support-bundle-kit:v0.0.14` + +#### Support Bundle Failed History Limit + +> Default: `1` + +This setting specifies how many failed support bundles can exist in the cluster. + +The retained failed support bundle is for analysis purposes and needs to clean up manually. + +Longhorn blocks support bundle creation when reaching the upper bound of the limitation. You can set this value to **0** to have Longhorn automatically purge all failed support bundles. + +#### Fast Replica Rebuild Enabled + +> Default: `false` + +The setting enables fast replica rebuilding feature. It relies on the checksums of snapshot disk files, so setting the snapshot-data-integrity to **enable** or **fast-check** is a prerequisite. + +#### Timeout of HTTP Client to Replica File Sync Server + +> Default: `30` + +The value in seconds specifies the timeout of the HTTP client to the replica's file sync server used for replica rebuilding, volume cloning, snapshot cloning, etc. + +### V2 Data Engine (Preview Feature) +#### V2 Data Engine + +> Default: `false` + +This allows users to activate the v2 data engine based on SPDK. Currently, it is in the preview phase and should not be utilized in a production environment. For more information, please refer to [V2 Data Engine (Preview Feature)](../../spdk). + +> **Warning** +> +> - DO NOT CHANGE THIS SETTING WITH ATTACHED VOLUMES. Longhorn will block this setting update when there are attached volumes. +> +> - When applying the setting, Longhorn will restart all instance-manager pods. +> +> - When the V2 Data Engine is enabled, each instance-manager pod utilizes 1 CPU core. This high CPU usage is attributed to the spdk_tgt process running within each instance-manager pod. The spdk_tgt process is responsible for handling input/output (IO) operations and requires intensive polling. As a result, it consumes 100% of a dedicated CPU core to efficiently manage and process the IO requests, ensuring optimal performance and responsiveness for storage operations. + +#### Offline Replica Rebuilding + +> Default: `enabled` + +This setting allows users to enable the offline replica rebuilding for volumes using v2 data engine. For more information, please refer to [Automatic Offline Replica Rebuilding](../../spdk/automatic-offline-replica-rebuilding). + +Here are the available options: +- `enabled` +- `disabled` + +### Snapshot + +#### Snapshot Data Integrity + +> Default: `fast-check` + +This setting allows users to enable or disable snapshot hashing and data integrity checking. Available options are: +- **disabled**: Disable snapshot disk file hashing and data integrity checking. +- **enabled**: Enables periodic snapshot disk file hashing and data integrity checking. To detect the filesystem-unaware corruption caused by bit rot or other issues in snapshot disk files, Longhorn system periodically hashes files and finds corrupted ones. Hence, the system performance will be impacted during the periodical checking. +- **fast-check**: Enable snapshot disk file hashing and fast data integrity checking. Longhorn system only hashes snapshot disk files if their are not hashed or the modification time are changed. In this mode, filesystem-unaware corruption cannot be detected, but the impact on system performance can be minimized. + +#### Immediate Snapshot Data Integrity Check After Creating a Snapshot + +> Default: `false` + +Hashing snapshot disk files impacts the performance of the system. The immediate snapshot hashing and checking can be disabled to minimize the impact after creating a snapshot. + +#### Snapshot Data Integrity Check CronJob + +> Default: `0 0 */7 * *` + +Unix-cron string format. The setting specifies when Longhorn checks the data integrity of snapshot disk files. +> **Warning** +> Hashing snapshot disk files impacts the performance of the system. It is recommended to run data integrity checks during off-peak times and to reduce the frequency of checks. + + +### Orphan + +#### Orphaned Data Automatic Deletion +> Default: `false` + +This setting allows Longhorn to automatically delete the `orphan` resource and its orphaned data like volume replica. + +### Backups + +#### Allow Recurring Job While Volume Is Detached + +> Default: `false` + +If this setting is enabled, Longhorn automatically attaches the volume and takes snapshot/backup when it is the time to do recurring snapshot/backup. + +> **Note:** During the time the volume was attached automatically, the volume is not ready for the workload. The workload will have to wait until the recurring job finishes. + +#### Backup Target + +> Example: `s3://backupbucket@us-east-1/backupstore` + +The target used for backup. NFS and S3 are supported. See [Setting a Backup Target](../../snapshots-and-backups/backup-and-restore/set-backup-target) for details. + +#### Backup Target Credential Secret + +> Example: `s3-secret` + +The Kubernetes secret associated with the backup target. See [Setting a Backup Target](../../snapshots-and-backups/backup-and-restore/set-backup-target) for details. + +#### Backupstore Poll Interval + +> Default: `300` + +The interval in seconds to poll the backup store for updating volumes' **Last Backup** field. Set to 0 to disable the polling. See [Setting up Disaster Recovery Volumes](../../snapshots-and-backups/setup-disaster-recovery-volumes) for details. + +For more information on how the backupstore poll interval affects the recovery time objective and recovery point objective, refer to the [concepts section.](../../concepts/#34-backupstore-update-intervals-rto-and-rpo) + +#### Failed Backup Time To Live + +> Default: `1440` + +The interval in minutes to keep the backup resource that was failed. Set to 0 to disable the auto-deletion. + +Failed backups will be checked and cleaned up during backupstore polling which is controlled by **Backupstore Poll Interval** setting. Hence this value determines the minimal wait interval of the cleanup. And the actual cleanup interval is multiple of **Backupstore Poll Interval**. Disabling **Backupstore Poll Interval** also means to disable failed backup auto-deletion. + +#### Cronjob Failed Jobs History Limit + +> Default: `1` + +This setting specifies how many failed backup or snapshot job histories should be retained. + +History will not be retained if the value is 0. + + +#### Cronjob Successful Jobs History Limit + +> Default: `1` + +This setting specifies how many successful backup or snapshot job histories should be retained. + +History will not be retained if the value is 0. + +#### Restore Volume Recurring Jobs + +> Default: `false` + +This setting allows restoring the recurring jobs of a backup volume from the backup target during a volume restoration if they do not exist on the cluster. +This is also a volume-specific setting with the below options. Users can customize it for each volume to override the global setting. + +> Default: `ignored` + +- `ignored`: This is the default option that instructs Longhorn to inherit from the global setting. + +- `enabled`: This option instructs Longhorn to restore volume recurring jobs/groups from the backup target forcibly. + +- `disabled`: This option instructs Longhorn no restoring volume recurring jobs/groups should be done. + +#### Backup Compression Method + +> Default: `lz4` + +This setting allows users to specify backup compression method. + +- `none`: Disable the compression method. Suitable for multimedia data such as encoded images and videos. + +- `lz4`: Fast compression method. Suitable for flat files. + +- `gzip`: A bit of higher compression ratio but relatively slow. + +#### Backup Concurrent Limit Per Backup + +> Default: `2` + +This setting controls how many worker threads per backup concurrently. + +#### Restore Concurrent Limit Per Backup + +> Default: `2` + +This setting controls how many worker threads per restore concurrently. + +### Scheduling + +#### Allow Volume Creation with Degraded Availability + +> Default: `true` + +This setting allows user to create and attach a volume that doesn't have all the replicas scheduled at the time of creation. + +> **Note:** It's recommended to disable this setting when using Longhorn in the production environment. See [Best Practices](../../best-practices/) for details. + +#### Disable Scheduling On Cordoned Node + +> Default: `true` + +When this setting is checked, the Longhorn Manager will not schedule replicas on Kubernetes cordoned nodes. + +When this setting is un-checked, the Longhorn Manager will schedule replicas on Kubernetes cordoned nodes. + +#### Replica Node Level Soft Anti-Affinity + +> Default: `false` + +When this setting is checked, the Longhorn Manager will allow scheduling on nodes with existing healthy replicas of the same volume. + +When this setting is un-checked, the Longhorn Manager will not allow scheduling on nodes with existing healthy replicas of the same volume. + +#### Replica Zone Level Soft Anti-Affinity + +> Default: `true` + +When this setting is checked, the Longhorn Manager will allow scheduling new replicas of a volume to the nodes in the same zone as existing healthy replicas. + +When this setting is un-checked, Longhorn Manager will not allow scheduling new replicas of a volume to the nodes in the same zone as existing healthy replicas. + +> **Note:** +> - Nodes that don't belong to any zone will be treated as if they belong to the same zone. +> - Longhorn relies on label `topology.kubernetes.io/zone=` in the Kubernetes node object to identify the zone. + +#### Replica Auto Balance + +> Default: `disabled` + +Enable this setting automatically rebalances replicas when discovered an available node. + +The available global options are: +- `disabled`. This is the default option. No replica auto-balance will be done. + +- `least-effort`. This option instructs Longhorn to balance replicas for minimal redundancy. + +- `best-effort`. This option instructs Longhorn try to balancing replicas for even redundancy. + Longhorn does not forcefully re-schedule the replicas to a zone that does not have enough nodes + to support even balance. Instead, Longhorn will re-schedule to balance at the node level. + +Longhorn also supports customizing for individual volume. The setting can be specified in UI or with Kubernetes manifest volume.spec.replicaAutoBalance, this overrules the global setting. +The available volume spec options are: + +> Default: `ignored` + +- `ignored`. This is the default option that instructs Longhorn to inherit from the global setting. + +- `disabled`. This option instructs Longhorn no replica auto-balance should be done." + +- `least-effort`. This option instructs Longhorn to balance replicas for minimal redundancy. + +- `best-effort`. This option instructs Longhorn to try balancing replicas for even redundancy. + Longhorn does not forcefully re-schedule the replicas to a zone that does not have enough nodes + to support even balance. Instead, Longhorn will re-schedule to balance at the node level. + +#### Storage Minimal Available Percentage + +> Default: `25` + +With the default setting of 25, the Longhorn Manager will allow scheduling new replicas only after the amount of disk space has been subtracted from the available disk space (**Storage Available**) and the available disk space is still over 25% of actual disk capacity (**Storage Maximum**). Otherwise the disk becomes unschedulable until more space is freed up. + +See [Multiple Disks Support](../../volumes-and-nodes/multidisk/#configuration) for details. + +#### Storage Over Provisioning Percentage + +> Default: `100` + +The over-provisioning percentage defines the amount of storage that can be allocated relative to the hard drive's capacity. + +By increase this setting, the Longhorn Manager will allow scheduling new replicas only after the amount of disk space has been added to the used disk space (**storage scheduled**), and the used disk space (**Storage Maximum** - **Storage Reserved**) is not over the over-provisioning percentage of the actual usable disk capacity. + +It's worth noting that a volume replica may require more storage space than the volume's actual size, as the snapshots also require storage. You can regain space by deleting unnecessary snapshots. + +### Danger Zone + +#### Concurrent Replica Rebuild Per Node Limit + +> Default: `5` + +This setting controls how many replicas on a node can be rebuilt simultaneously. + +Typically, Longhorn can block the replica starting once the current rebuilding count on a node exceeds the limit. But when the value is 0, it means disabling the replica rebuilding. + +> **WARNING:** +> - The old setting "Disable Replica Rebuild" is replaced by this setting. +> - Different from relying on replica starting delay to limit the concurrent rebuilding, if the rebuilding is disabled, replica object replenishment will be directly skipped. +> - When the value is 0, the eviction and data locality feature won't work. But this shouldn't have any impact to any current replica rebuild and backup restore. + + +#### Kubernetes Taint Toleration + +> Example: `nodetype=storage:NoSchedule` + +If you want to dedicate nodes to just store Longhorn replicas and reject other general workloads, you can set tolerations for **all** Longhorn components and add taints to the nodes dedicated for storage. + +Longhorn system contains user deployed components (e.g, Longhorn manager, Longhorn driver, Longhorn UI) and system managed components (e.g, instance manager, engine image, CSI driver, etc.) +This setting only sets taint tolerations for system managed components. +Depending on how you deployed Longhorn, you need to set taint tolerations for user deployed components in Helm chart or deployment YAML file. + +All Longhorn volumes should be detached before modifying toleration settings. +We recommend setting tolerations during Longhorn deployment because the Longhorn system cannot be operated during the update. + +Multiple tolerations can be set here, and these tolerations are separated by semicolon. For example: +* `key1=value1:NoSchedule; key2:NoExecute` +* `:` this toleration tolerates everything because an empty key with operator `Exists` matches all keys, values and effects +* `key1=value1:` this toleration has empty effect. It matches all effects with key `key1` + See [Taint Toleration](../../advanced-resources/deploy/taint-toleration) for details. + +#### Priority Class + +> Example: `high-priority` + +By default, Longhorn workloads run with the same priority as other pods in the cluster, meaning in cases of node pressure, such as a node running out of memory, Longhorn workloads will be at the same priority as other Pods for eviction. + +The Priority Class setting will specify a Priority Class for the Longhorn workloads to run as. This can be used to set the priority for Longhorn workloads higher so that they will not be the first to be evicted when a node is under pressure. + +Longhorn system contains user deployed components (e.g, Longhorn manager, Longhorn driver, Longhorn UI) and system managed components (e.g, instance manager, engine image, CSI driver, etc.). + +Note that this setting only sets Priority Class for system managed components. +Depending on how you deployed Longhorn, you need to set Priority Class for user deployed components in Helm chart or deployment YAML file. + +> **Warning:** This setting should only be changed after detaching all Longhorn volumes, as the Longhorn system components will be restarted to apply the setting. The Priority Class update will take a while, and users cannot operate Longhorn system during the update. Hence, it's recommended to set the Priority Class during Longhorn deployment. + +See [Priority Class](../../advanced-resources/deploy/priority-class) for details. + +#### System Managed Components Node Selector + +> Example: `label-key1:label-value1;label-key2:label-value2` + +If you want to restrict Longhorn components to only run on a particular set of nodes, you can set node selector for all Longhorn components. + +Longhorn system contains user deployed components (e.g, Longhorn manager, Longhorn driver, Longhorn UI) and system managed components (e.g, instance manager, engine image, CSI driver, etc.) +You need to set node selector for both of them. This setting only sets node selector for system managed components. Follow the instruction at [Node Selector](../../advanced-resources/deploy/node-selector) to change node selector. + +> **Warning:** Since all Longhorn components will be restarted, the Longhorn system is unavailable temporarily. +Make sure all Longhorn volumes are `detached`. If there are running Longhorn volumes in the system, this means the Longhorn system cannot restart its components and the request will be rejected. +Don't operate the Longhorn system while node selector settings are updated and Longhorn components are being restarted. + +#### Kubernetes Cluster Autoscaler Enabled (Experimental) + +> Default: `false` + +Setting the Kubernetes Cluster Autoscaler Enabled to `true` allows Longhorn to unblock the Kubernetes Cluster Autoscaler scaling. + +See [Kubernetes Cluster Autoscaler Support](../../high-availability/k8s-cluster-autoscaler) for details. + +> **Warning:** Replica rebuilding could be expensive because nodes with reusable replicas could get removed by the Kubernetes Cluster Autoscaler. + +#### Storage Network + +> Example: `kube-system/demo-192-168-0-0` + +The storage network uses Multus NetworkAttachmentDefinition to segregate the in-cluster data traffic from the default Kubernetes cluster network. + +> **Warning:** This setting should change after detaching all Longhorn volumes, as some of the Longhorn system component pods will get recreated to apply the setting. Longhorn will try to block this setting update when there are attached volumes. + +See [Storage Network](../../advanced-resources/deploy/storage-network) for details. + +#### Remove Snapshots During Filesystem Trim + +> Example: `false` + +This setting allows Longhorn filesystem trim feature to automatically mark the latest snapshot and its ancestors as removed and stops at the snapshot containing multiple children. + +Since Longhorn filesystem trim feature can be applied to the volume head and the followed continuous removed or system snapshots only. + +Notice that trying to trim a removed files from a valid snapshot will do nothing but the filesystem will discard this kind of in-memory trimmable file info. Later on if you mark the snapshot as removed and want to retry the trim, you may need to unmount and remount the filesystem so that the filesystem can recollect the trimmable file info. + +See [Trim Filesystem](../../volumes-and-nodes/trim-filesystem) for details. + +#### Guaranteed Instance Manager CPU + +> Default: `12` + +This integer value indicates how many percentage of the total allocatable CPU on each node will be reserved for each instance manager Pod. For example, 10 means 10% of the total CPU on a node will be allocated to each instance manager pod on this node. This will help maintain engine and replica stability during high node workload. + +In order to prevent an unexpected volume instance (engine/replica) crash as well as guarantee a relatively acceptable I/O performance, you can use the following formula to calculate a value for this setting: + + Guaranteed Instance Manager CPU = The estimated max Longhorn volume engine and replica count on a node * 0.1 / The total allocatable CPUs on the node * 100. + +The result of above calculation doesn't mean that's the maximum CPU resources the Longhorn workloads require. To fully exploit the Longhorn volume I/O performance, you can allocate/guarantee more CPU resources via this setting. + +If it's hard to estimate the usage now, you can leave it with the default value, which is 12%. Then you can tune it when there is no running workload using Longhorn volumes. + +> **Warning:** +> - Value 0 means removing the CPU requests from spec of instance manager pods. +> - Considering the possible number of new instance manager pods in a further system upgrade, this integer value ranges from 0 to 40. +> - One more set of instance manager pods may need to be deployed when the Longhorn system is upgraded. If current available CPUs of the nodes are not enough for the new instance manager pods, you need to detach the volumes using the oldest instance manager pods so that Longhorn can clean up the old pods automatically and release the CPU resources. And the new pods with the latest instance manager image will be launched then. +> - This global setting will be ignored for a node if the field "InstanceManagerCPURequest" on the node is set. +> - After this setting is changed, all instance manager pods using this global setting on all the nodes will be automatically restarted. In other words, DO NOT CHANGE THIS SETTING WITH ATTACHED VOLUMES. diff --git a/content/docs/1.5.1/snapshots-and-backups/_index.md b/content/docs/1.5.1/snapshots-and-backups/_index.md new file mode 100644 index 000000000..60eee74a8 --- /dev/null +++ b/content/docs/1.5.1/snapshots-and-backups/_index.md @@ -0,0 +1,5 @@ +--- + title: Backup and Restore + description: Backup and Restore Volume Snapshots in Longhorn + weight: 6 +--- \ No newline at end of file diff --git a/content/docs/1.5.1/snapshots-and-backups/backup-and-restore/_index.md b/content/docs/1.5.1/snapshots-and-backups/backup-and-restore/_index.md new file mode 100644 index 000000000..9976d36d9 --- /dev/null +++ b/content/docs/1.5.1/snapshots-and-backups/backup-and-restore/_index.md @@ -0,0 +1,19 @@ +--- +title: Backup and Restore +weight: 2 +--- + +> Before v1.2.0, Longhorn uses a blocking way for communication with the remote backup target, so there will be some potential voluntary or involuntary factors (ex: network latency) impacting the functions relying on remote backup target like listing backups or even causing further cascading problems after the backup target operation. + +> Since v1.2.0, Longhorn starts using an asynchronous way to do backup operations to resolve the abovementioned issues in the previous versions. +> - Create backup cluster custom resources first, and then perform the following snapshot and backup operations to the remote backup target. +> - Once the backup creation is completed, asynchronously pull the state of backup volumes and backups from the remote backup target, then update the status of the corresponding cluster custom resources. +> +> Besides, this enhancement is scalable for the backup query to solve the costly resources (even query timeout) caused by the original blocking way because all backups are saved as custom resources instead of querying from the remote target directly. +> +> Please note that: after the Longhorn upgrade, if a volume hasn’t been upgraded to the latest longhorn engine (>=v1.2.0). When creating a backup, it will have the intermediate transition state of the name of the created backup (due to the different backup name handling in the latest longhorn version >= v1.2.0). However, in the end, Longhorn will ensure the backup is synced with the remote backup target and the backup will be updated to the final correct state as the remote backup target is the single source of truth. To upgrade the Longhorn engine, please refer to [Manually Upgrade Longhorn Engine](../../deploy/upgrade/upgrade-engine) or [Automatically Upgrade Longhorn Engine](../../deploy/upgrade/auto-upgrade-engine). + +- [Setting a Backup Target](./set-backup-target) +- [Create a Backup](./create-a-backup) +- [Restore from a Backup](./restore-from-a-backup) +- [Restoring Volumes for Kubernetes StatefulSets](./restore-statefulset) diff --git a/content/docs/1.5.1/snapshots-and-backups/backup-and-restore/create-a-backup.md b/content/docs/1.5.1/snapshots-and-backups/backup-and-restore/create-a-backup.md new file mode 100644 index 000000000..4cb0f419a --- /dev/null +++ b/content/docs/1.5.1/snapshots-and-backups/backup-and-restore/create-a-backup.md @@ -0,0 +1,19 @@ +--- +title: Create a Backup +weight: 2 +--- + +Backups in Longhorn are objects in an off-cluster backupstore. A backup of a snapshot is copied to the backupstore, and the endpoint to access the backupstore is the backup target. For more information, see [this section.](../../../concepts/#31-how-backups-work) + +> **Prerequisite:** A backup target must be set up. For more information, see [Set the BackupTarget](../set-backup-target). If the BackupTarget has not been set, you'll be presented with an error. + +To create a backup, + +1. Navigate to the **Volume** menu. +2. Select the volume you wish to back up. +3. Click **Create Backup.** +4. Add any appropriate labels and click OK. + +**Result:** The backup is created. To see it, click **Backup** in the top navigation bar. + +For information on how to restore a volume from a snapshot, refer to [this page.](../restore-from-a-backup) \ No newline at end of file diff --git a/content/docs/1.5.1/snapshots-and-backups/backup-and-restore/restore-from-a-backup.md b/content/docs/1.5.1/snapshots-and-backups/backup-and-restore/restore-from-a-backup.md new file mode 100644 index 000000000..e2f053a63 --- /dev/null +++ b/content/docs/1.5.1/snapshots-and-backups/backup-and-restore/restore-from-a-backup.md @@ -0,0 +1,21 @@ +--- +title: Restore from a Backup +weight: 3 +--- + +Longhorn can easily restore backups to a volume. + +For more information on how backups work, refer to the [concepts](../../../concepts/#3-backups-and-secondary-storage) section. + +When you restore a backup, it creates a volume of the same name by default. If a volume with the same name as the backup already exists, the backup will not be restored. + +To restore a backup, + +1. Navigate to the **Backup.** menu +2. Select the backup(s) you wish to restore and click **Restore Latest Backup.** +3. In the **Name** field, select the volume you wish to restore. +4. Click **OK.** + +You can then create the PV/PVC from the volume after restoring a volume from a backup. Here you can specify the `storageClassName` or leave it empty to use the `storageClassName` inherited from the PVC of the backup volume. The `StorageClass` should be already in the cluster to prevent any further issue. + +**Result:** The restored volume is available on the **Volume** page. \ No newline at end of file diff --git a/content/docs/1.5.1/snapshots-and-backups/backup-and-restore/restore-recurring-jobs-from-a-backup.md b/content/docs/1.5.1/snapshots-and-backups/backup-and-restore/restore-recurring-jobs-from-a-backup.md new file mode 100644 index 000000000..ebcceea84 --- /dev/null +++ b/content/docs/1.5.1/snapshots-and-backups/backup-and-restore/restore-recurring-jobs-from-a-backup.md @@ -0,0 +1,62 @@ +--- +title: Restore Volume Recurring Jobs from a Backup +weight: 5 +--- + +Since v1.4.0, Longhorn supports recurring jobs backup and restore along with the volume backup and restore. When restoring a backup volume, if users enable the `Restore Volume Recurring Jobs` setting, the original recurring jobs of the volume will be restored back accordingly. + +For more information on the setting `Restore Volume Recurring Jobs`, refer to the [settings](../../../references/settings/#restore-volume-recurring-jobs) section. + +For more information on how volume backup works, refer to the [concepts](../../../concepts/#3-backups-and-secondary-storage) section. + +When restoring a volume with recurring jobs, Longhorn will restore them together. If the volume name already exists, the volume and the recurring jobs will not be restored. If the recurring job name already exists but the spec is different, the restoring recurring job will be created with a randomly generated name to avoid conflict. Otherwise, Longhorn will try to reuse existing recurring jobs instead if they are the same as restoring recurring jobs of a backup volume. + +By default, Longhorn will not automatically restore volume recurring jobs, users can enable the automatic restoration by Longhorn UI or kubectl. + +## Via Longhorn UI + +1. Navigate to the **Setting** menu and click **General** +2. Enable the `Restore Volume Recurring Jobs` +3. Navigate to the **Backup** menu +4. Select the backup(s) you wish to restore and click **Restore Latest Backup.** +5. In the **Name** field, select the volume you wish to restore. +6. Click **OK** + +## Via Command Line + +```bash +# kubectl -n longhorn-system edit settings.longhorn.io restore-volume-recurring-jobs +``` + +Then, set the value to `true`. + +```text +# kubectl -n longhorn-system get setting restore-volume-recurring-jobs +NAME VALUE AGE +restore-volume-recurring-jobs false 28m +``` + +### Example of Volume Specific Setting + +```yaml +apiVersion: longhorn.io/v1beta2 +kind: Volume +metadata: + labels: + longhornvolume: vol-01 + name: vol-01 + namespace: longhorn-system +spec: + restoreVolumeRecurringJob: ignored + engineImage: longhornio/longhorn-engine:v1.4.0 + fromBackup: "s3://backupbucket@us-east-1?volume=minio-vol01&backup=backup-eeb2782d5b2f42bb" + frontend: blockdev +``` + +Users can override the setting `restore-volume-recurring-jobs` by the volume spec property `spec.restoreVolumeRecurringJob`. + +- **ignored**. This is the default option that instructs Longhorn to inherit from the global setting. +- **enabled**. This option instructs Longhorn to restore volume recurring jobs from the backup target forcibly. +- **disabled**. This option instructs Longhorn no restoring volume recurring jobs should be done. + +**Result:** The restored volume recurring jobs are available on the **RecurringJob** page. diff --git a/content/docs/1.5.1/snapshots-and-backups/backup-and-restore/restore-statefulset.md b/content/docs/1.5.1/snapshots-and-backups/backup-and-restore/restore-statefulset.md new file mode 100644 index 000000000..015225f94 --- /dev/null +++ b/content/docs/1.5.1/snapshots-and-backups/backup-and-restore/restore-statefulset.md @@ -0,0 +1,138 @@ +--- +title: Restoring Volumes for Kubernetes StatefulSets +weight: 4 +--- +Longhorn supports restoring backups, and one of the use cases for this feature is to restore data for use in a Kubernetes StatefulSet, which requires restoring a volume for each replica that was backed up. + +To restore, follow the below instructions. The example below uses a StatefulSet with one volume attached to each Pod and two replicas. + +1. Connect to the `Longhorn UI` page in your web browser. Under the `Backup` tab, select the name of the StatefulSet volume. Click the dropdown menu of the volume entry and restore it. Name the volume something that can easily be referenced later for the `Persistent Volumes`. + - Repeat this step for each volume you need restored. + - For example, if restoring a StatefulSet with two replicas that had volumes named `pvc-01a` and `pvc-02b`, the restore could look like this: + + | Backup Name | Restored Volume | + |-------------|-------------------| + | pvc-01a | statefulset-vol-0 | + | pvc-02b | statefulset-vol-1 | + +2. In Kubernetes, create a `Persistent Volume` for each Longhorn volume that was created. Name the volumes something that can easily be referenced later for the `Persistent Volume Claims`. `storage` capacity, `numberOfReplicas`, `storageClassName`, and `volumeHandle` must be replaced below. In the example, we're referencing `statefulset-vol-0` and `statefulset-vol-1` in Longhorn and using `longhorn` as our `storageClassName`. + + ``` + apiVersion: v1 + kind: PersistentVolume + metadata: + name: statefulset-vol-0 + spec: + capacity: + storage: # must match size of Longhorn volume + volumeMode: Filesystem + accessModes: + - ReadWriteOnce + persistentVolumeReclaimPolicy: Delete + csi: + driver: driver.longhorn.io # driver must match this + fsType: ext4 + volumeAttributes: + numberOfReplicas: # must match Longhorn volume value + staleReplicaTimeout: '30' # in minutes + volumeHandle: statefulset-vol-0 # must match volume name from Longhorn + storageClassName: longhorn # must be same name that we will use later + --- + apiVersion: v1 + kind: PersistentVolume + metadata: + name: statefulset-vol-1 + spec: + capacity: + storage: # must match size of Longhorn volume + volumeMode: Filesystem + accessModes: + - ReadWriteOnce + persistentVolumeReclaimPolicy: Delete + csi: + driver: driver.longhorn.io # driver must match this + fsType: ext4 + volumeAttributes: + numberOfReplicas: # must match Longhorn volume value + staleReplicaTimeout: '30' + volumeHandle: statefulset-vol-1 # must match volume name from Longhorn + storageClassName: longhorn # must be same name that we will use later + ``` +3. In the `namespace` the `StatefulSet` will be deployed in, create PersistentVolume Claims **for each** `Persistent Volume`. The name of the `Persistent Volume Claim` must follow this naming scheme: + + ``` + -- + ``` + StatefulSet Pods are zero-indexed. In this example, the name of the `Volume Claim + Template` is `data`, the name of the `StatefulSet` is `webapp`, and there + are two replicas, which are indexes `0` and `1`. + + ``` + apiVersion: v1 + kind: PersistentVolumeClaim + metadata: + name: data-webapp-0 + spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 2Gi # must match size from earlier + storageClassName: longhorn # must match name from earlier + volumeName: statefulset-vol-0 # must reference Persistent Volume + --- + apiVersion: v1 + kind: PersistentVolumeClaim + metadata: + name: data-webapp-1 + spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 2Gi # must match size from earlier + storageClassName: longhorn # must match name from earlier + volumeName: statefulset-vol-1 # must reference Persistent Volume + ``` + +4. Create the `StatefulSet`: + + ``` + apiVersion: apps/v1beta2 + kind: StatefulSet + metadata: + name: webapp # match this with the PersistentVolumeClaim naming scheme + spec: + selector: + matchLabels: + app: nginx # has to match .spec.template.metadata.labels + serviceName: "nginx" + replicas: 2 # by default is 1 + template: + metadata: + labels: + app: nginx # has to match .spec.selector.matchLabels + spec: + terminationGracePeriodSeconds: 10 + containers: + - name: nginx + image: registry.k8s.io/nginx-slim:0.8 + ports: + - containerPort: 80 + name: web + volumeMounts: + - name: data + mountPath: /usr/share/nginx/html + volumeClaimTemplates: + - metadata: + name: data # match this with the PersistentVolumeClaim naming scheme + spec: + accessModes: [ "ReadWriteOnce" ] + storageClassName: longhorn # must match name from earlier + resources: + requests: + storage: 2Gi # must match size from earlier + ``` + +**Result:** The restored data should now be accessible from inside the `StatefulSet` +`Pods`. diff --git a/content/docs/1.5.1/snapshots-and-backups/backup-and-restore/set-backup-target.md b/content/docs/1.5.1/snapshots-and-backups/backup-and-restore/set-backup-target.md new file mode 100644 index 000000000..d89648972 --- /dev/null +++ b/content/docs/1.5.1/snapshots-and-backups/backup-and-restore/set-backup-target.md @@ -0,0 +1,435 @@ +--- +title: Setting a Backup Target +weight: 1 +--- + +A backup target is an endpoint used to access a backup store in Longhorn. A backup store is an NFS server, SMB/CIFS server, Azure Blob Storage server, or S3 compatible server that stores the backups of Longhorn volumes. The backup target can be set at `Settings/General/BackupTarget`. + +For more information about how the backupstore works in Longhorn, see the [concepts section.](../../../concepts/#3-backups-and-secondary-storage) + +If you don't have access to AWS S3 or want to give the backupstore a try first, we've also provided a way to [setup a local S3 testing backupstore](#set-up-a-local-testing-backupstore) using [MinIO](https://minio.io/). + +Longhorn also supports setting up recurring snapshot/backup jobs for volumes, via Longhorn UI or Kubernetes Storage Class. See [here](../../scheduling-backups-and-snapshots) for details. + +This page covers the following topics: + +- [Set up AWS S3 Backupstore](#set-up-aws-s3-backupstore) +- [Set up GCP Cloud Storage Backupstore](#set-up-gcp-cloud-storage-backupstore) +- [Set up a Local Testing Backupstore](#set-up-a-local-testing-backupstore) +- [Using a self-signed SSL certificate for S3 communication](#using-a-self-signed-ssl-certificate-for-s3-communication) +- [Enable virtual-hosted-style access for S3 compatible Backupstore](#enable-virtual-hosted-style-access-for-s3-compatible-backupstore) +- [Set up NFS Backupstore](#set-up-nfs-backupstore) +- [Set up SMB/CIFS Backupstore](#set-up-smbcifs-backupstore) +- [Set up Azure Blob Storage Backupstore](#set-up-azure-blob-storage-backupstore) + +### Set up AWS S3 Backupstore + +1. Create a new bucket in [AWS S3.](https://aws.amazon.com/s3/) + +2. Set permissions for Longhorn. There are two options for setting up the credentials. The first is that you can set up a Kubernetes secret with the credentials of an AWS IAM user. The second is that you can use a third-party application to manage temporary AWS IAM permissions for a Pod via annotations rather than operating with AWS credentials. + - Option 1: Create a Kubernetes secret with IAM user credentials + + 1. Follow the [guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html#id_users_create_console) to create a new AWS IAM user, with the following permissions set. Edit the `Resource` section to use your S3 bucket name: + + ```json + { + "Version": "2012-10-17", + "Statement": [ + { + "Sid": "GrantLonghornBackupstoreAccess0", + "Effect": "Allow", + "Action": [ + "s3:PutObject", + "s3:GetObject", + "s3:ListBucket", + "s3:DeleteObject" + ], + "Resource": [ + "arn:aws:s3:::", + "arn:aws:s3:::/*" + ] + } + ] + } + ``` + + 2. Create a Kubernetes secret with a name such as `aws-secret` in the namespace where Longhorn is placed (`longhorn-system` by default). The secret must be created in the `longhorn-system` namespace for Longhorn to access it: + + ```shell + kubectl create secret generic \ + --from-literal=AWS_ACCESS_KEY_ID= \ + --from-literal=AWS_SECRET_ACCESS_KEY= \ + -n longhorn-system + ``` + + - Option 2: Set permissions with IAM temporary credentials by AWS STS AssumeRole (kube2iam or kiam) + + [kube2iam](https://github.com/jtblin/kube2iam) or [kiam](https://github.com/uswitch/kiam) is a Kubernetes application that allows managing AWS IAM permissions for Pod via annotations rather than operating on AWS credentials. Follow the instructions in the GitHub repository for kube2iam or kiam to install it into the Kubernetes cluster. + + 1. Follow the [guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-service.html#roles-creatingrole-service-console) to create a new AWS IAM role for AWS S3 service, with the following permissions set: + + ```json + { + "Version": "2012-10-17", + "Statement": [ + { + "Sid": "GrantLonghornBackupstoreAccess0", + "Effect": "Allow", + "Action": [ + "s3:PutObject", + "s3:GetObject", + "s3:ListBucket", + "s3:DeleteObject" + ], + "Resource": [ + "arn:aws:s3:::", + "arn:aws:s3:::/*" + ] + } + ] + } + ``` + + 2. Edit the AWS IAM role with the following trust relationship: + + ```json + { + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Principal": { + "Service": "ec2.amazonaws.com" + }, + "Action": "sts:AssumeRole" + }, + { + "Effect": "Allow", + "Principal": { + "AWS": "arn:aws:iam:::role/" + }, + "Action": "sts:AssumeRole" + } + ] + } + ``` + + 3. Create a Kubernetes secret with a name such as `aws-secret` in the namespace where Longhorn is placed (`longhorn-system` by default). The secret must be created in the `longhorn-system` namespace for Longhorn to access it: + + ```shell + kubectl create secret generic \ + --from-literal=AWS_IAM_ROLE_ARN= \ + -n longhorn-system + ``` + +3. Go to the Longhorn UI. In the top navigation bar, click **Settings.** In the Backup section, set **Backup Target** to: + + ```text + s3://@/ + ``` + + Make sure that you have `/` at the end, otherwise you will get an error. A subdirectory (prefix) may be used: + + ```text + s3://@/mypath/ + ``` + + Also make sure you've set **`` in the URL**. + + For example, For AWS, you can find the region codes [here.](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html) + + For Google Cloud Storage, you can find the region codes [here.](https://cloud.google.com/storage/docs/locations) + +4. In the Backup section set **Backup Target Credential Secret** to: + + ``` + aws-secret + ``` + This is the secret name with AWS credentials or AWS IAM role. + +**Result:** Longhorn can store backups in S3. To create a backup, see [this section.](../create-a-backup) + +**Note:** If you operate Longhorn behind a proxy and you want to use AWS S3 as the backupstore, you must provide Longhorn information about your proxy in the `aws-secret` as below: +```shell +kubectl create secret generic \ + --from-literal=AWS_ACCESS_KEY_ID= \ + --from-literal=AWS_SECRET_ACCESS_KEY= \ + --from-literal=HTTP_PROXY= \ + --from-literal=HTTPS_PROXY= \ + --from-literal=NO_PROXY= \ + -n longhorn-system +``` + +Make sure `NO_PROXY` contains the network addresses, network address ranges and domains that should be excluded from using the proxy. In order for Longhorn to operate, the minimum required values for `NO_PROXY` are: +* localhost +* 127.0.0.1 +* 0.0.0.0 +* 10.0.0.0/8 (K8s components' IPs) +* 192.168.0.0/16 (internal IPs in the cluster) + +### Set up GCP Cloud Storage Backupstore + +1. Create a new bucket in [Google Cloud Storage](https://console.cloud.google.com/storage/browser?referrer=search&project=elite-protocol-319303) +2. Create a GCP serviceaccount in [IAM & Admin](https://console.cloud.google.com/iam-admin) +3. Give the GCP serviceaccount permissions to read, write, and delete objects in the bucket. + + The serviceaccount will require the `roles/storage.objectAdmin` role to read, write, and delete objects in the bucket. + + Here is a reference to the GCP IAM roles you have available for granting access to a serviceaccount https://cloud.google.com/storage/docs/access-control/iam-roles. + +> Note: Consider creating an IAM condition to reduce how many buckets this serviceaccount has object admin access to. + +4. Navigate to your [buckets in cloud storage](https://console.cloud.google.com/storage/browser) and select your newly created bucket. +5. Go to the cloud storage's settings menu and navigate to the [interoperability tab](https://console.cloud.google.com/storage/settings;tab=interoperability) +6. Scroll down to _Service account HMAC_ and press `+ CREATE A KEY FOR A SERVICE ACCOUNT` +7. Select the GCP serviceaccount you created earlier and press `CREATE KEY` +8. Save the _Access Key_ and _Secret_. + + Also note down the configured _Storage URI_ under the _Request Endpoint_ while you're in the interoperability menu. + +- The Access Key will be mapped to the `AWS_ACCESS_KEY_ID` field in the Kubernetes secret we create later. +- The Secret will be mapped to the `AWS_SECRET_ACCESS_KEY` field in the Kubernetes secret we create later. +- The Storage URI will be mapped to the `AWS_ENDPOINTS` field in the Kubernetes secret we create later. + +9. Go to the Longhorn UI. In the top navigation bar, click **Settings.** In the Backup section, set **Backup Target** to + +``` +s3://${BUCKET_NAME}@us/ +``` + +And set **Backup Target Credential Secret** to: + +``` +longhorn-gcp-backups +``` + +10. Create a Kubernetes secret named `longhorn-gcp-backups` in the `longhorn-system` namespace with the following content: + +```yaml +apiVersion: v1 +kind: Secret +metadata: + name: longhorn-gcp-backups + namespace: longhorn-system +type: Opaque +stringData: + AWS_ACCESS_KEY_ID: GOOG1EBYHGDE4WIGH2RDYNZWWWDZ5GMQDRMNSAOTVHRAILWAMIZ2O4URPGOOQ + AWS_ENDPOINTS: https://storage.googleapis.com + AWS_SECRET_ACCESS_KEY: BKoKpIW021s7vPtraGxDOmsJbkV/0xOVBG73m+8f +``` +> Note: The secret can be named whatever you like as long as they match what's in longhorn's settings. + +Once the secret is created and Longhorn's settings are saved, navigate to the backup tab in Longhorn. If there are any issues, they should pop up as a toast notification. + +If you don't get any error messages, try creating a backup and confirm the content is pushed out to your new bucket. + +### Set up a Local Testing Backupstore +We provides two testing purpose backupstore based on NFS server and MinIO S3 server for testing, in `./deploy/backupstores`. + +1. Use following command to setup a MinIO S3 server for the backupstore after `longhorn-system` was created. + + ``` + kubectl create -f https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/deploy/backupstores/minio-backupstore.yaml + ``` + +2. Go to the Longhorn UI. In the top navigation bar, click **Settings.** In the Backup section, set **Backup Target** to + + ``` + s3://backupbucket@us-east-1/ + ``` + And set **Backup Target Credential Secret** to: + ``` + minio-secret + ``` + + The `minio-secret` yaml looks like this: + + ``` + apiVersion: v1 + kind: Secret + metadata: + name: minio-secret + namespace: longhorn-system + type: Opaque + data: + AWS_ACCESS_KEY_ID: bG9uZ2hvcm4tdGVzdC1hY2Nlc3Mta2V5 # longhorn-test-access-key + AWS_SECRET_ACCESS_KEY: bG9uZ2hvcm4tdGVzdC1zZWNyZXQta2V5 # longhorn-test-secret-key + AWS_ENDPOINTS: aHR0cHM6Ly9taW5pby1zZXJ2aWNlLmRlZmF1bHQ6OTAwMA== # https://minio-service.default:9000 + AWS_CERT: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURMRENDQWhTZ0F3SUJBZ0lSQU1kbzQycGhUZXlrMTcvYkxyWjVZRHN3RFFZSktvWklodmNOQVFFTEJRQXcKR2pFWU1CWUdBMVVFQ2hNUFRHOXVaMmh2Y200Z0xTQlVaWE4wTUNBWERUSXdNRFF5TnpJek1EQXhNVm9ZRHpJeApNakF3TkRBek1qTXdNREV4V2pBYU1SZ3dGZ1lEVlFRS0V3OU1iMjVuYUc5eWJpQXRJRlJsYzNRd2dnRWlNQTBHCkNTcUdTSWIzRFFFQkFRVUFBNElCRHdBd2dnRUtBb0lCQVFEWHpVdXJnUFpEZ3pUM0RZdWFlYmdld3Fvd2RlQUQKODRWWWF6ZlN1USs3K21Oa2lpUVBvelVVMmZvUWFGL1BxekJiUW1lZ29hT3l5NVhqM1VFeG1GcmV0eDBaRjVOVgpKTi85ZWFJNWRXRk9teHhpMElPUGI2T0RpbE1qcXVEbUVPSXljdjRTaCsvSWo5Zk1nS0tXUDdJZGxDNUJPeThkCncwOVdkckxxaE9WY3BKamNxYjN6K3hISHd5Q05YeGhoRm9tb2xQVnpJbnlUUEJTZkRuSDBuS0lHUXl2bGhCMGsKVHBHSzYxc2prZnFTK3hpNTlJeHVrbHZIRXNQcjFXblRzYU9oaVh6N3lQSlorcTNBMWZoVzBVa1JaRFlnWnNFbQovZ05KM3JwOFhZdURna2kzZ0UrOElXQWRBWHExeWhqRDdSSkI4VFNJYTV0SGpKUUtqZ0NlSG5HekFnTUJBQUdqCmF6QnBNQTRHQTFVZER3RUIvd1FFQXdJQ3BEQVRCZ05WSFNVRUREQUtCZ2dyQmdFRkJRY0RBVEFQQmdOVkhSTUIKQWY4RUJUQURBUUgvTURFR0ExVWRFUVFxTUNpQ0NXeHZZMkZzYUc5emRJSVZiV2x1YVc4dGMyVnlkbWxqWlM1awpaV1poZFd4MGh3Ui9BQUFCTUEwR0NTcUdTSWIzRFFFQkN3VUFBNElCQVFDbUZMMzlNSHVZMzFhMTFEajRwMjVjCnFQRUM0RHZJUWozTk9kU0dWMmQrZjZzZ3pGejFXTDhWcnF2QjFCMVM2cjRKYjJQRXVJQkQ4NFlwVXJIT1JNU2MKd3ViTEppSEtEa0Jmb2U5QWI1cC9VakpyS0tuajM0RGx2c1cvR3AwWTZYc1BWaVdpVWorb1JLbUdWSTI0Q0JIdgpnK0JtVzNDeU5RR1RLajk0eE02czNBV2xHRW95YXFXUGU1eHllVWUzZjFBWkY5N3RDaklKUmVWbENtaENGK0JtCmFUY1RSUWN3cVdvQ3AwYmJZcHlERFlwUmxxOEdQbElFOW8yWjZBc05mTHJVcGFtZ3FYMmtYa2gxa3lzSlEralAKelFadHJSMG1tdHVyM0RuRW0yYmk0TktIQVFIcFc5TXUxNkdRakUxTmJYcVF0VEI4OGpLNzZjdEg5MzRDYWw2VgotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0t + ``` + For more information on creating a secret, see [the Kubernetes documentation.](https://kubernetes.io/docs/concepts/configuration/secret/#creating-a-secret-manually) The secret must be created in the `longhorn-system` namespace for Longhorn to access it. + + > Note: Make sure to use `echo -n` when generating the base64 encoding, otherwise an new line will be added at the end of the string and it will cause error when accessing the S3. + +3. Click the **Backup** tab in the UI. It should report an empty list without any errors. + +**Result:** Longhorn can store backups in S3. To create a backup, see [this section.](../create-a-backup) + +### Using a self-signed SSL certificate for S3 communication +If you want to use a self-signed SSL certificate, you can specify AWS_CERT in the Kubernetes secret you provided to Longhorn. See the example in [Set up a Local Testing Backupstore](#set-up-a-local-testing-backupstore). +It's important to note that the certificate needs to be in PEM format, and must be its own CA. Or one must include a certificate chain that contains the CA certificate. +To include multiple certificates, one can just concatenate the different certificates (PEM files). + +### Enable virtual-hosted-style access for S3 compatible Backupstore +**You may need to enable this new addressing approach for your S3 compatible Backupstore when** +1. you want to switch to this new access style right now so that you won't need to worry about [Amazon S3 Path Deprecation Plan](https://aws.amazon.com/blogs/aws/amazon-s3-path-deprecation-plan-the-rest-of-the-story/); +2. the backupstore you are using supports virtual-hosted-style access only, e.g., Alibaba Cloud(Aliyun) OSS; +3. you have configurated `MINIO_DOMAIN` environment variable to [enable virtual-host-style requests for the MinIO server](https://docs.min.io/docs/minio-server-configuration-guide.html); +4. the error `...... error: AWS Error: SecondLevelDomainForbidden Please use virtual hosted style to access. .....` is triggered. + +**The way to enable virtual-hosted-style access** +1. Add a new field `VIRTUAL_HOSTED_STYLE` with value `true` to your backup target secret. e.g.: + ``` + apiVersion: v1 + kind: Secret + metadata: + name: s3-compatible-backup-target-secret + namespace: longhorn-system + type: Opaque + data: + AWS_ACCESS_KEY_ID: bG9uZ2hvcm4tdGVzdC1hY2Nlc3Mta2V5 + AWS_SECRET_ACCESS_KEY: bG9uZ2hvcm4tdGVzdC1zZWNyZXQta2V5 + AWS_ENDPOINTS: aHR0cHM6Ly9taW5pby1zZXJ2aWNlLmRlZmF1bHQ6OTAwMA== + VIRTUAL_HOSTED_STYLE: dHJ1ZQ== # true + ``` +2. Deploy/update the secret and set it in `Settings/General/BackupTargetSecret`. + +### Set up NFS Backupstore + +For using NFS server as backupstore, NFS server must support NFSv4. + +The target URL should look like this: + +``` +nfs://longhorn-test-nfs-svc.default:/opt/backupstore +``` + +You can find an example NFS backupstore for testing purpose [here](https://github.com/longhorn/longhorn/blob/v{{< current-version >}}/deploy/backupstores/nfs-backupstore.yaml). + +**Result:** Longhorn can store backups in NFS. To create a backup, see [this section.](../create-a-backup) + +### Set up SMB/CIFS Backupstore + +Before configuring a SMB/CIFS backupstore, a credential secret for the backupstore can be created and deployed by + ``` + #!/bin/bash + + USERNAME=${Username of SMB/CIFS Server} + PASSWORD=${Password of SMB/CIFS Server} + + CIFS_USERNAME=`echo -n ${USERNAME} | base64` + CIFS_PASSWORD=`echo -n ${PASSWORD} | base64` + + cat <>cifs_secret.yml + apiVersion: v1 + kind: Secret + metadata: + name: cifs-secret + namespace: longhorn-system + type: Opaque + data: + CIFS_USERNAME: ${CIFS_USERNAME} + CIFS_PASSWORD: ${CIFS_PASSWORD} + EOF + + kubectl apply -f cifs_secret.yml + ``` + +Then, navigate to Longhorn UI > Setting > General > Backup + +1. Set **Backup Target**. The target URL should look like this: + + ``` + cifs://longhorn-test-cifs-svc.default/opt/backupstore + ``` + +2. Set **Backup Target Credential Secret** + + ``` + cifs-secret + ``` + This is the secret name with CIFS credentials. + +You can find an example CIFS backupstore for testing purpose [here](https://github.com/longhorn/longhorn/blob/v{{< current-version >}}/deploy/backupstores/cifs-backupstore.yaml). + +**Result:** Longhorn can store backups in CIFS. To create a backup, see [this section.](../create-a-backup) + +### Set up Azure Blob Storage Backupstore + +1. Create a new container in [Azure Blob Storage Service](https://portal.azure.com/) + +2. Before configuring an Azure Blob Storage backup store, create a Kubernetes secret with a name such as `azblob-secret` in the namespace where Longhorn is installed (`longhorn-system`). The secret must be created in the same namespace for Longhorn to access it. + + - The Account Name will be the `AZBLOB_ACCOUNT_NAME` field in the secret. + - The Account Secret Key will be the `AZBLOB_ACCOUNT_KEY` field in the secret. + - The Storage URI will be the `AZBLOB_ENDPOINT` field in the secret. + + - By a manifest: + ```shell + #!/bin/bash + + # AZBLOB_ACCOUNT_NAME: Account name of Azure Blob Storage server + # AZBLOB_ACCOUNT_KEY: Account key of Azure Blob Storage server + # AZBLOB_ENDPOINT: Endpoint of Azure Blob Storage server + # AZBLOB_CERT: SSL certificate for Azure Blob Storage server + + AZBLOB_ACCOUNT_NAME=`echo -n ${AZBLOB_ACCOUNT_NAME} | base64` + AZBLOB_ACCOUNT_KEY=`echo -n ${AZBLOB_ACCOUNT_KEY} | base64` + AZBLOB_ENDPOINT=`echo -n ${AZBLOB_ENDPOINT} | base64` + AZBLOB_CERT=`echo -n ${AZBLOB_CERT} | base64` + + cat <>azblob_secret.yml + apiVersion: v1 + kind: Secret + metadata: + name: azblob-secret + namespace: longhorn-system + type: Opaque + data: + AZBLOB_ACCOUNT_NAME: ${AZBLOB_ACCOUNT_NAME} + AZBLOB_ACCOUNT_KEY: ${AZBLOB_ACCOUNT_KEY} + #AZBLOB_ENDPOINT: ${AZBLOB_ENDPOINT} + #AZBLOB_CERT: ${AZBLOB_CERT} + #HTTP_PROXY: aHR0cDovLzEwLjIxLjkxLjUxOjMxMjg= + #HTTPS_PROXY: aHR0cDovLzEwLjIxLjkxLjUxOjMxMjg= + EOF + + kubectl apply -f azblob_secret.yml + ``` + + - CLI command: + ```shell + kubectl create secret generic \ + --from-literal=AZBLOB_ACCOUNT_NAME= \ + --from-literal=AZBLOB_ACCOUNT_KEY= \ + --from-literal=HTTP_PROXY= \ + --from-literal=HTTPS_PROXY= \ + --from-literal=NO_PROXY= \ + -n longhorn-system + ``` + +Then, navigate to Longhorn UI > Setting > General > Backup + +1. Set **Backup Target**. The target URL should look like this: + + ```txt + azblob://[your-container-name]@[endpoint-suffix]/ + ``` + + Make sure that you have `/` at the end, otherwise you will get an error. A subdirectory (prefix) may be used: + + ```text + azblob://[your-container-name]@[endpoint-suffix]/my-path/ + ``` + + - If you set `` in the URL, the default endpoint suffix will be `core.windows.net`. + - If you set `AZBLOB_ENDPOINT` in the secret, Longhorn will use `AZBLOB_ENDPOINT` as your storage URL, and `` will not be used even if it has been set. + +2. Set **Backup Target Credential Secret** + + ```txt + azblob-secret + ``` + + +After configuring the above settings, you can manage backups on Azure Blob storage. See [how to create backup](../create-a-backup) for details. diff --git a/content/docs/1.5.1/snapshots-and-backups/csi-snapshot-support/_index.md b/content/docs/1.5.1/snapshots-and-backups/csi-snapshot-support/_index.md new file mode 100644 index 000000000..30786afd4 --- /dev/null +++ b/content/docs/1.5.1/snapshots-and-backups/csi-snapshot-support/_index.md @@ -0,0 +1,11 @@ +--- +title: CSI Snapshot Support +description: Creating and Restoring Longhorn Snapshots/Backups via the kubernetes CSI snapshot mechanism +weight: 3 +--- + +## History +- [GitHub Issue](https://github.com/longhorn/longhorn/issues/304) +- [Longhorn Enhancement Proposal](https://github.com/longhorn/longhorn/blob/master/enhancements/20200904-csi-snapshot-support.md) + +Available since v1.1.0 diff --git a/content/docs/1.5.1/snapshots-and-backups/csi-snapshot-support/csi-volume-snapshot-associated-with-longhorn-backing-image.md b/content/docs/1.5.1/snapshots-and-backups/csi-snapshot-support/csi-volume-snapshot-associated-with-longhorn-backing-image.md new file mode 100644 index 000000000..83387ea2e --- /dev/null +++ b/content/docs/1.5.1/snapshots-and-backups/csi-snapshot-support/csi-volume-snapshot-associated-with-longhorn-backing-image.md @@ -0,0 +1,221 @@ +--- +title: CSI VolumeSnapshot Associated with Longhorn BackingImage +weight: 2 +--- + +BackingImage in Longhorn is an object that represents a QCOW2 or RAW image which can be set as the backing/base image of a Longhorn volume. + +Instead of directly using Longhorn BackingImage resource for BackingImage management. You can also use the generic Kubernetes CSI VolumeSnapshot mechanism. To learn more about the CSI VolumeSnapshot mechanism, click [here](https://kubernetes.io/docs/concepts/storage/volume-snapshots/). + +> **Prerequisite:** CSI snapshot support needs to be enabled on your cluster. +> If your kubernetes distribution does not provide the kubernetes snapshot controller +> as well as the snapshot related custom resource definitions, you need to manually deploy them. +> For more information, see [Enable CSI Snapshot Support](../enable-csi-snapshot-support). + +## Create A CSI VolumeSnapshot Associated With Longhorn BackingImage + +To create a CSI VolumeSnapshot associated with a Longhorn BackingImage, you first need to create a `VolumeSnapshotClass` object +with the parameter `type` set to `bi` as follow: +```yaml +kind: VolumeSnapshotClass +apiVersion: snapshot.storage.k8s.io/v1 +metadata: + name: longhorn-snapshot-vsc +driver: driver.longhorn.io +deletionPolicy: Delete +parameters: + type: bi + # export-type default to raw if it is not given + export-type: qcow2 +``` +For more information about `VolumeSnapshotClass`, see the kubernetes documentation for [VolumeSnapshotClasses](https://kubernetes.io/docs/concepts/storage/volume-snapshot-classes/). + +After that, create a Kubernetes `VolumeSnapshot` object with `volumeSnapshotClassName` points to the name of the `VolumeSnapshotClass` (`longhorn-snapshot-vsc`) and +the `source` points to the PVC of the Longhorn volume for which a Longhorn BackingImage should be exported from. +```yaml +apiVersion: snapshot.storage.k8s.io/v1 +kind: VolumeSnapshot +metadata: + name: test-csi-volume-snapshot-longhorn-backing-image +spec: + volumeSnapshotClassName: longhorn-snapshot-vsc + source: + persistentVolumeClaimName: test-vol +``` + +**Result:** +A Longhorn BackingImage is created. The `VolumeSnapshot` object creation leads to the creation of a `VolumeSnapshotContent` Kubernetes object. +The `VolumeSnapshotContent` refers to a Longhorn BackingImage in its `VolumeSnapshotContent.snapshotHandle` field with the name `bi://backing?backingImageDataSourceType=export-from-volume&backingImage=${GENERATED_SNAPSHOT_NAME}&volume-name=test-vol&export-type=qcow2`. + +### Viewing the Longhorn BackingImage + +To see the BackingImage, click **Setting > Backing Image** in the top navigation bar and click the BackingImage mentioned in the `VolumeSnapshotContent.snapshotHandle`. + + +### How the CSI Mechanism Works in this Scenario + +When the VolumeSnapshot object is created with kubectl, the `VolumeSnapshot.uuid` field is used to identify a Longhorn BackingImage and the associated `VolumeSnapshotContent` object. + +This creates a new Longhorn BackingImage named `snapshot-uuid` and the CSI request returns. + +Afterwards a `VolumeSnapshotContent` object named `snapcontent-uuid` is created with the `VolumeSnapshotContent.readyToUse` flag is set to **true**. + + +## Restore PVC from CSI VolumeSnapshot Associated With Longhorn BackingImage +Create a `PersistentVolumeClaim` object where the `dataSource` field points to an existing `VolumeSnapshot` object that is associated with Longhorn BackingImage. + +The csi-provisioner will pick this up and instruct the Longhorn CSI driver to provision a new volume using the associated Longhorn BackingImage. + +An example `PersistentVolumeClaim` is below. The `dataSource` field needs to point to an existing `VolumeSnapshot` object. + +```yaml +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: test-restore-pvc +spec: + storageClassName: longhorn + dataSource: + name: test-csi-volume-snapshot-longhorn-backing-image + kind: VolumeSnapshot + apiGroup: snapshot.storage.k8s.io + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 5Gi +``` + +### Restore a Longhorn BackingImage that Has No Associated `VolumeSnapshot` (pre-provision) + +You can use the CSI mechanism to restore Longhorn BackingImage that has not been created via the CSI mechanism. +To restore Longhorn BackingImage that has not been created via the CSI mechanism, you have to first manually create a `VolumeSnapshot` and `VolumeSnapshotContent` object for the BackingImage. + +Create a `VolumeSnapshotContent` object with the `snapshotHandle` field set to `bi://backing?backingImageDataSourceType=${TYPE}&backingImage=${BACKINGIMAGE_NAME}&backingImageChecksum=${backingImageChecksum}&${OTHER_PARAMETES}` which point to an existing BackingImage. + +- Users need to provide following query parameters in `snapshotHandle` for validation purpose: + - `backingImageDataSourceType`: `sourceType` of existing BackingImage, e.g. `export-from-volume`, `download` + - `backingImage`: Name of the BackingImage + - `backingImageChecksum`: Optional. Checksum of the BackingImage. + - you should also provide the `sourceParameters` of existing BackingImage in the `snapshotHandle` based on the `backingImageDataSourceType` + - `export-from-volume`: + - `volume-name`: volume to be expoted from. + - `export-type`: qcow2 or raw. + - `download`: + - `url`: url of the BackingImage. + - `checksum`: optional. + +The parameters can be retrieved from the **Setting > Backing Image** page in the Longhorn UI. + +```yaml +apiVersion: snapshot.storage.k8s.io/v1 +kind: VolumeSnapshotContent +metadata: + name: test-existing-backing +spec: + volumeSnapshotClassName: longhorn-snapshot-vsc + driver: driver.longhorn.io + deletionPolicy: Delete + source: + snapshotHandle: bi://backing?backingImageDataSourceType=download&backingImage=test-bi&url=https%3A%2F%2Flonghorn-backing-image.s3-us-west-1.amazonaws.com%2Fparrot.qcow2&backingImageChecksum=bd79ab9e6d45abf4f3f0adf552a868074dd235c4698ce7258d521160e0ad79ffe555b94e7d4007add6e1a25f4526885eb25c53ce38f7d344dd4925b9f2cb5d3b + volumeSnapshotRef: + name: test-snapshot-existing-backing + namespace: default +``` + +Create the associated `VolumeSnapshot` object with the `name` field set to `test-snapshot-existing-backing`, where the `source` field refers to a `VolumeSnapshotContent` object via the `volumeSnapshotContentName` field. + +This differs from the creation of a BackingImage, in which case the `source` field refers to a `PerstistentVolumeClaim` via the `persistentVolumeClaimName` field. + +Only one type of reference can be set for a `VolumeSnapshot` object. + +```yaml +apiVersion: snapshot.storage.k8s.io/v1beta1 +kind: VolumeSnapshot +metadata: + name: test-snapshot-existing-backing +spec: + volumeSnapshotClassName: longhorn-snapshot-vsc + source: + volumeSnapshotContentName: test-existing-backing +``` + +Now you can create a `PerstistantVolumeClaim` object that refers to the newly created `VolumeSnapshot` object. +For an example see [Restore PVC from CSI VolumeSnapshot Associated With Longhorn BackingImage](#restore-pvc-from-csi-volumesnapshot-associated-with-longhorn-backingimage) above. + + +### Restore a Longhorn BackingImage that Has Not Created (on-demand provision) + +You can use the CSI mechanism to restore Longhorn BackingImage which has not been created yet. This mechanism only support following 2 kinds of BackingImage data sources. + +1. `download`: Download a file from a URL as a BackingImage. +2. `export-from-volume`: Export an existing in-cluster volume as a backing image. + +Users need to create the `VolumeSnapshotContent` with an associated `VolumeSnapshot`. The `snapshotHandle` of the `VolumeSnapshotContent` needs to provide the parameters of the data source. Example below for a non-existing BackingImage `test-bi` with two different data sources. + +1. `download`: Users need to provide following parameters + - `backingImageDataSourceType`: `download` for on-demand download. + - `backingImage`: Name of the BackingImage + - `url`: Download the file from a URL as a BackingImage. + - `backingImageChecksum`: Optional. Used for validating the file. + - example yaml: + ```yaml + apiVersion: snapshot.storage.k8s.io/v1 + kind: VolumeSnapshotContent + metadata: + name: test-on-demand-backing + spec: + volumeSnapshotClassName: longhorn-snapshot-vsc + driver: driver.longhorn.io + deletionPolicy: Delete + source: + # NOTE: change this to provide the correct parameters + snapshotHandle: bi://backing?backingImageDataSourceType=download&backingImage=test-bi&url=https%3A%2F%2Flonghorn-backing-image.s3-us-west-1.amazonaws.com%2Fparrot.qcow2&backingImageChecksum=bd79ab9e6d45abf4f3f0adf552a868074dd235c4698ce7258d521160e0ad79ffe555b94e7d4007add6e1a25f4526885eb25c53ce38f7d344dd4925b9f2cb5d3b + volumeSnapshotRef: + name: test-snapshot-on-demand-backing + namespace: default + ``` + +2. `export-from-volume`: Users need to provide following parameters + - `backingImageDataSourceType`: `export-form-volume` for on-demand export. + - `backingImage`: Name of the BackingImage + - `volume-name`: Volume to be exported for the BackingImage + - `export-type`: Currently Longhorn supports `raw` or `qcow2` + - example yaml: + ```yaml + apiVersion: snapshot.storage.k8s.io/v1 + kind: VolumeSnapshotContent + metadata: + name: test-on-demand-backing + spec: + volumeSnapshotClassName: longhorn-snapshot-vsc + driver: driver.longhorn.io + deletionPolicy: Delete + source: + # NOTE: change this to provide the correct parameters + snapshotHandle: bi://backing?backingImageDataSourceType=export-from-volume&backingImage=test-bi&volume-name=vol-export-src&export-type=qcow2 + volumeSnapshotRef: + name: test-snapshot-on-demand-backing + namespace: default + ``` + +Create the associated `VolumeSnapshot` object with the `name` field set to `test-snapshot-on-demand-backing`, where the `source` field refers to a `VolumeSnapshotContent` object via the `volumeSnapshotContentName` field. + +This differs from the creation of a BackingImage, in which case the `source` field refers to a `PerstistentVolumeClaim` via the `persistentVolumeClaimName` field. + +Only one type of reference can be set for a `VolumeSnapshot` object. + +```yaml +apiVersion: snapshot.storage.k8s.io/v1beta1 +kind: VolumeSnapshot +metadata: + name: test-snapshot-on-demand-backing +spec: + volumeSnapshotClassName: longhorn-snapshot-vsc + source: + volumeSnapshotContentName: test-on-demand-backing +``` + +Now you can create a `PerstistantVolumeClaim` object that refers to the newly created `VolumeSnapshot` object. +Longhorn will create the BackingImage with the parameters provide in the `snapshotHandle`. +For an example see [Restore PVC from CSI VolumeSnapshot Associated With Longhorn BackingImage](#restore-pvc-from-csi-volumesnapshot-associated-with-longhorn-backingimage) above. diff --git a/content/docs/1.5.1/snapshots-and-backups/csi-snapshot-support/csi-volume-snapshot-associated-with-longhorn-backup.md b/content/docs/1.5.1/snapshots-and-backups/csi-snapshot-support/csi-volume-snapshot-associated-with-longhorn-backup.md new file mode 100644 index 000000000..82982f9b6 --- /dev/null +++ b/content/docs/1.5.1/snapshots-and-backups/csi-snapshot-support/csi-volume-snapshot-associated-with-longhorn-backup.md @@ -0,0 +1,139 @@ +--- +title: CSI VolumeSnapshot Associated with Longhorn Backup +weight: 3 +--- + +Backups in Longhorn are objects in an off-cluster backupstore, and the endpoint to access the backupstore is the backup target. For more information, see [this section.](../../../concepts/#31-how-backups-work) + +To programmatically create backups, you can use the generic Kubernetes CSI VolumeSnapshot mechanism. To learn more about the CSI VolumeSnapshot mechanism, click [here](https://kubernetes.io/docs/concepts/storage/volume-snapshots/). + +> **Prerequisite:** CSI snapshot support needs to be enabled on your cluster. +> If your kubernetes distribution does not provide the kubernetes snapshot controller +> as well as the snapshot related custom resource definitions, you need to manually deploy them. +> For more information, see [Enable CSI Snapshot Support](../enable-csi-snapshot-support). + +## Create A CSI VolumeSnapshot Associated With Longhorn Backup + +To create a CSI VolumeSnapshot associated with a Longhorn backup, you first need to create a `VolumeSnapshotClass` object +with the parameter `type` set to `bak` as follow: +```yaml +kind: VolumeSnapshotClass +apiVersion: snapshot.storage.k8s.io/v1 +metadata: + name: longhorn-backup-vsc +driver: driver.longhorn.io +deletionPolicy: Delete +parameters: + type: bak +``` +For more information about `VolumeSnapshotClass`, see the kubernetes documentation for [VolumeSnapshotClasses](https://kubernetes.io/docs/concepts/storage/volume-snapshot-classes/). + +After that, create a Kubernetes `VolumeSnapshot` object with `volumeSnapshotClassName` points to the name of the `VolumeSnapshotClass` (`longhorn-backup-vsc`) and +the `source` points to the PVC of the Longhorn volume for which a backup should be created. +```yaml +apiVersion: snapshot.storage.k8s.io/v1 +kind: VolumeSnapshot +metadata: + name: test-csi-volume-snapshot-longhorn-backup +spec: + volumeSnapshotClassName: longhorn-backup-vsc + source: + persistentVolumeClaimName: test-vol +``` + +**Result:** +A backup is created. The `VolumeSnapshot` object creation leads to the creation of a `VolumeSnapshotContent` Kubernetes object. +The `VolumeSnapshotContent` refers to a Longhorn backup in its `VolumeSnapshotContent.snapshotHandle` field with the name `bak://backup-volume/backup-name`. + +### Viewing the Backup + +To see the backup, click **Backup** in the top navigation bar and navigate to the backup-volume mentioned in the `VolumeSnapshotContent.snapshotHandle`. + +For information on how to restore a volume via a `VolumeSnapshot` object, refer to the below sections. + +### How the CSI Mechanism Works in this Scenario + +When the VolumeSnapshot object is created with kubectl, the `VolumeSnapshot.uuid` field is used to identify a Longhorn snapshot and the associated `VolumeSnapshotContent` object. + +This creates a new Longhorn snapshot named `snapshot-uuid`. + +Then a backup of that snapshot is initiated, and the CSI request returns. + +Afterwards a `VolumeSnapshotContent` object named `snapcontent-uuid` is created. + +The CSI snapshotter sidecar periodically queries the Longhorn CSI plugin to evaluate the backup status. + +Once the backup is completed, the `VolumeSnapshotContent.readyToUse` flag is set to **true**. + + +## Restore PVC from CSI VolumeSnapshot Associated With Longhorn Backup +Create a `PersistentVolumeClaim` object where the `dataSource` field points to an existing `VolumeSnapshot` object that is associated with Longhorn backup. + +The csi-provisioner will pick this up and instruct the Longhorn CSI driver to provision a new volume with the data from the associated backup. + +An example `PersistentVolumeClaim` is below. The `dataSource` field needs to point to an existing `VolumeSnapshot` object. + +```yaml +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: test-restore-pvc +spec: + storageClassName: longhorn + dataSource: + name: test-csi-volume-snapshot-longhorn-backup + kind: VolumeSnapshot + apiGroup: snapshot.storage.k8s.io + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 5Gi +``` +Note that the `spec.resources.requests.storage` value must be the same as the size of `VolumeSnapshot` object. + + +#### Restore a Longhorn Backup that Has No Associated `VolumeSnapshot` +You can use the CSI mechanism to restore Longhorn backups that have not been created via the CSI mechanism. +To restore Longhorn backups that have not been created via the CSI mechanism, you have to first manually create a `VolumeSnapshot` and `VolumeSnapshotContent` object for the backup. + +Create a `VolumeSnapshotContent` object with the `snapshotHandle` field set to `bak://backup-volume/backup-name`. + +The `backup-volume` and `backup-name` values can be retrieved from the **Backup** page in the Longhorn UI. + +```yaml +apiVersion: snapshot.storage.k8s.io/v1 +kind: VolumeSnapshotContent +metadata: + name: test-existing-backup +spec: + volumeSnapshotClassName: longhorn + driver: driver.longhorn.io + deletionPolicy: Delete + source: + # NOTE: change this to point to an existing backup on the backupstore + snapshotHandle: bak://test-vol/backup-625159fb469e492e + volumeSnapshotRef: + name: test-snapshot-existing-backup + namespace: default +``` + +Create the associated `VolumeSnapshot` object with the `name` field set to `test-snapshot-existing-backup`, where the `source` field refers to a `VolumeSnapshotContent` object via the `volumeSnapshotContentName` field. + +This differs from the creation of a backup, in which case the `source` field refers to a `PerstistentVolumeClaim` via the `persistentVolumeClaimName` field. + +Only one type of reference can be set for a `VolumeSnapshot` object. + +```yaml +apiVersion: snapshot.storage.k8s.io/v1 +kind: VolumeSnapshot +metadata: + name: test-snapshot-existing-backup +spec: + volumeSnapshotClassName: longhorn + source: + volumeSnapshotContentName: test-existing-backup +``` + +Now you can create a `PerstistantVolumeClaim` object that refers to the newly created `VolumeSnapshot` object. +For an example see [Restore PVC from CSI VolumeSnapshot Associated With Longhorn Backup](#restore-pvc-from-csi-volumesnapshot-associated-with-longhorn-backup) above. diff --git a/content/docs/1.5.1/snapshots-and-backups/csi-snapshot-support/csi-volume-snapshot-associated-with-longhorn-snapshot.md b/content/docs/1.5.1/snapshots-and-backups/csi-snapshot-support/csi-volume-snapshot-associated-with-longhorn-snapshot.md new file mode 100644 index 000000000..92547a4e4 --- /dev/null +++ b/content/docs/1.5.1/snapshots-and-backups/csi-snapshot-support/csi-volume-snapshot-associated-with-longhorn-snapshot.md @@ -0,0 +1,86 @@ +--- +title: CSI VolumeSnapshot Associated with Longhorn Snapshot +weight: 2 +--- + +Snapshot in Longhorn is an object that represents content of a Longhorn volume at a particular moment. It is stored inside the cluster. + +To programmatically create Longhorn snapshots, you can use the generic Kubernetes CSI VolumeSnapshot mechanism. To learn more about the CSI VolumeSnapshot mechanism, click [here](https://kubernetes.io/docs/concepts/storage/volume-snapshots/). + +> **Prerequisite:** CSI snapshot support needs to be enabled on your cluster. +> If your kubernetes distribution does not provide the kubernetes snapshot controller +> as well as the snapshot related custom resource definitions, you need to manually deploy them. +> For more information, see [Enable CSI Snapshot Support](../enable-csi-snapshot-support). + +## Create A CSI VolumeSnapshot Associated With Longhorn Snapshot + +To create a CSI VolumeSnapshot associated with a Longhorn snapshot, you first need to create a `VolumeSnapshotClass` object +with the parameter `type` set to `snap` as follow: +```yaml +kind: VolumeSnapshotClass +apiVersion: snapshot.storage.k8s.io/v1 +metadata: + name: longhorn-snapshot-vsc +driver: driver.longhorn.io +deletionPolicy: Delete +parameters: + type: snap +``` +For more information about `VolumeSnapshotClass`, see the kubernetes documentation for [VolumeSnapshotClasses](https://kubernetes.io/docs/concepts/storage/volume-snapshot-classes/). + +After that, create a Kubernetes `VolumeSnapshot` object with `volumeSnapshotClassName` points to the name of the `VolumeSnapshotClass` (`longhorn-snapshot-vsc`) and +the `source` points to the PVC of the Longhorn volume for which a Longhorn snapshot should be created. +```yaml +apiVersion: snapshot.storage.k8s.io/v1 +kind: VolumeSnapshot +metadata: + name: test-csi-volume-snapshot-longhorn-snapshot +spec: + volumeSnapshotClassName: longhorn-snapshot-vsc + source: + persistentVolumeClaimName: test-vol +``` + +**Result:** +A Longhorn snapshot is created. The `VolumeSnapshot` object creation leads to the creation of a `VolumeSnapshotContent` Kubernetes object. +The `VolumeSnapshotContent` refers to a Longhorn snapshot in its `VolumeSnapshotContent.snapshotHandle` field with the name `snap://volume-name/snapshot-name`. + +### Viewing the Longhorn Snapshot + +To see the snapshot, click **Volume** in the top navigation bar and click the volume mentioned in the `VolumeSnapshotContent.snapshotHandle`. Scroll down to see the list of all volume snapshots. + + +### How the CSI Mechanism Works in this Scenario + +When the VolumeSnapshot object is created with kubectl, the `VolumeSnapshot.uuid` field is used to identify a Longhorn snapshot and the associated `VolumeSnapshotContent` object. + +This creates a new Longhorn snapshot named `snapshot-uuid` and the CSI request returns. + +Afterwards a `VolumeSnapshotContent` object named `snapcontent-uuid` is created with the `VolumeSnapshotContent.readyToUse` flag is set to **true**. + + +## Restore PVC from CSI VolumeSnapshot Associated With Longhorn Snapshot +Create a `PersistentVolumeClaim` object where the `dataSource` field points to an existing `VolumeSnapshot` object that is associated with Longhorn snapshot. + +The csi-provisioner will pick this up and instruct the Longhorn CSI driver to provision a new volume with the data from the associated Longhorn snapshot. + +An example `PersistentVolumeClaim` is below. The `dataSource` field needs to point to an existing `VolumeSnapshot` object. + +```yaml +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: test-restore-pvc +spec: + storageClassName: longhorn + dataSource: + name: test-csi-volume-snapshot-longhorn-snapshot + kind: VolumeSnapshot + apiGroup: snapshot.storage.k8s.io + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 5Gi +``` +Note that the `spec.resources.requests.storage` value must be the same as the size of `VolumeSnapshot` object. diff --git a/content/docs/1.5.1/snapshots-and-backups/csi-snapshot-support/enable-csi-snapshot-support.md b/content/docs/1.5.1/snapshots-and-backups/csi-snapshot-support/enable-csi-snapshot-support.md new file mode 100644 index 000000000..fae4132d3 --- /dev/null +++ b/content/docs/1.5.1/snapshots-and-backups/csi-snapshot-support/enable-csi-snapshot-support.md @@ -0,0 +1,55 @@ +--- +title: Enable CSI Snapshot Support on a Cluster +description: Enable CSI Snapshot Support for Programmatic Creation of Longhorn Snapshots/Backups +weight: 1 +--- + +> **Prerequisite** +> +> It is the responsibility of the Kubernetes distribution to deploy the snapshot controller as well as the related custom resource definitions. +> +> For more information, see [CSI Volume Snapshots](https://kubernetes.io/docs/concepts/storage/volume-snapshots/). + +#### If your Kubernetes Distribution Does Not Bundle the Snapshot Controller + +You may manually install these components by executing the following steps. + + +> **Prerequisite** +> +> Please install the same release version of snapshot CRDs and snapshot controller to ensure that the CRD version is compatible with the snapshot controller. +> +> For general use, update the snapshot controller YAMLs with an appropriate **namespace** prior to installing. +> +> For example, on a vanilla Kubernetes cluster, update the namespace from `default` to `kube-system` prior to issuing the `kubectl create` command. + +Install the Snapshot CRDs: +1. Download the files from https://github.com/kubernetes-csi/external-snapshotter/tree/v6.2.1/client/config/crd +because Longhorn v{{< current-version >}} uses [CSI external-snapshotter](https://kubernetes-csi.github.io/docs/external-snapshotter.html) v6.2.1 +2. Run `kubectl create -k client/config/crd`. +3. Do this once per cluster. + +Install the Common Snapshot Controller: +1. Download the files from https://github.com/kubernetes-csi/external-snapshotter/tree/v6.2.1/deploy/kubernetes/snapshot-controller +because Longhorn v{{< current-version >}} uses [CSI external-snapshotter](https://kubernetes-csi.github.io/docs/external-snapshotter.html) v6.2.1 +2. Update the namespace to an appropriate value for your environment (e.g. `kube-system`) +3. Run `kubectl create -k deploy/kubernetes/snapshot-controller`. +3. Do this once per cluster. +> **Note:** previously, the snapshot controller YAML files were deployed into the `default` namespace by default. +> The updated YAML files are being deployed into `kube-system` namespace by default. +> Therefore, we suggest deleting the previous snapshot controller in the `default` namespace to avoid having multiple snapshot controllers. + +See the [Usage](https://github.com/kubernetes-csi/external-snapshotter#usage) section from the kubernetes +external-snapshotter git repo for additional information. + +#### Add a Default `VolumeSnapshotClass` +Ensure the availability of the Snapshot CRDs. Afterwards create a default `VolumeSnapshotClass`. +```yaml +# Use v1 as an example +kind: VolumeSnapshotClass +apiVersion: snapshot.storage.k8s.io/v1 +metadata: + name: longhorn +driver: driver.longhorn.io +deletionPolicy: Delete +``` diff --git a/content/docs/1.5.1/snapshots-and-backups/csi-volume-clone.md b/content/docs/1.5.1/snapshots-and-backups/csi-volume-clone.md new file mode 100644 index 000000000..523b4bea5 --- /dev/null +++ b/content/docs/1.5.1/snapshots-and-backups/csi-volume-clone.md @@ -0,0 +1,48 @@ +--- +title: CSI Volume Clone Support +description: Creating a new volume as a duplicate of an existing volume +weight: 3 +--- + +Longhorn supports [CSI volume cloning](https://kubernetes.io/docs/concepts/storage/volume-pvc-datasource/). +Suppose that you have the following `source-pvc`: +```yaml +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: source-pvc +spec: + storageClassName: longhorn + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 10Gi +``` +You can create a new PVC that has the exact same content as the `source-pvc` by applying the following yaml file: +```yaml +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: cloned-pvc +spec: + storageClassName: longhorn + dataSource: + name: source-pvc + kind: PersistentVolumeClaim + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 10Gi +``` + +> Note: +> In addition to the requirements listed at [CSI volume cloning](https://kubernetes.io/docs/concepts/storage/volume-pvc-datasource/), +> the `cloned-pvc` must have the same `resources.requests.storage` as the `source-pvc`. + +## History +- [GitHub Issue](https://github.com/longhorn/longhorn/issues/1815) +- [Longhorn Enhancement Proposal](https://github.com/longhorn/longhorn/pull/2864) + +Available since v1.2.0 diff --git a/content/docs/1.5.1/snapshots-and-backups/scheduling-backups-and-snapshots.md b/content/docs/1.5.1/snapshots-and-backups/scheduling-backups-and-snapshots.md new file mode 100644 index 000000000..377fd85de --- /dev/null +++ b/content/docs/1.5.1/snapshots-and-backups/scheduling-backups-and-snapshots.md @@ -0,0 +1,222 @@ +--- +title: Recurring Snapshots and Backups +weight: 3 +--- + +From the Longhorn UI, the volume can refer to recurring snapshots and backups as independent jobs or as recurring job groups. + +To create a recurring job, you can go to the `Recurring Job` page in Longhorn and `Create Recurring Job` or in the volume detail view in Longhorn. + +You can configure, +- Any groups that the job should belong to +- The type of schedule, either `backup`, `backup-force-create`, `snapshot`, `snapshot-force-create`, `snapshot-cleanup`, `snapshot-delete` or `filesystem-trim` +- The time that the backup or snapshot will be created, in the form of a [CRON expression](https://en.wikipedia.org/wiki/Cron#CRON_expression) +- The number of backups or snapshots to retain +- The number of jobs to run concurrently +- Any labels that should be applied to the backup or snapshot + +Recurring jobs can be set up using the Longhorn UI, `kubectl`, or by using a Longhorn `RecurringJob`. + +To add a recurring job to a volume, you will go to the volume detail view in Longhorn. Then you can set `Recurring Jobs Schedule`. + +- Create a new recurring job +- Select from existing recurring jobs +- Select from existing recurring job groups + +Then Longhorn will automatically create snapshots or backups for the volume at the recurring job scheduled time, as long as the volume is attached to a node. +If you want to set up recurring snapshots and backups even when the volumes are detached, see the section [Allow Recurring Job While Volume Is Detached](#allow-recurring-job-while-volume-is-detached) + +You can set recurring jobs on a Longhorn Volume, Kubernetes Persistent Volume Claim (PVC), or Kubernetes StorageClass. +> Note: When the PVC has recurring job labels, they will override all recurring job labels of the associated Volume. + +For more information on how snapshots and backups work, refer to the [concepts](../../concepts) section. + +> Note: To avoid the problem that recurring jobs may overwrite the old backups/snapshots with identical backups and empty snapshots when the volume doesn't have new data for a long time, Longhorn does the following: +> 1. Recurring backup job only takes a new backup when the volume has new data since the last backup. +> 1. Recurring snapshot job only takes a new snapshot when the volume has new data in the volume head (the live data). + +## Set up Recurring Jobs + +### Using the Longhorn UI + +Recurring snapshots and backups can be configured from the `Recurring Job` page or the volume detail page. + +### Using the manifest + +You can also configure the recurring job by directly interacting with the Longhorn RecurringJob custom resource. +```yaml +apiVersion: longhorn.io/v1beta1 +kind: RecurringJob +metadata: + name: snapshot-1 + namespace: longhorn-system +spec: + cron: "* * * * *" + task: "snapshot" + groups: + - default + - group1 + retain: 1 + concurrency: 2 + labels: + label/1: a + label/2: b +``` + +The following parameters should be specified for each recurring job selector: + +- `name`: Name of the recurring job. Do not use duplicate names. And the length of `name` should be no more than 40 characters. + +- `task`: Type of the job. Longhorn supports the following: + - `backup`: periodically create snapshots then do backups after cleaning up outdated snapshots + - `backup-force-create`: periodically create snapshots the do backups + - `snapshot`: periodically create snapshots after cleaning up outdated snapshots + - `snapshot-force-create`: periodically create snapshots + - `snapshot-cleanup`: periodically purge removable snapshots and system snapshots + > **Note:** retain value has no effect for this task, Longhorn automatically mutates the `retain` value to 0. + + - `snapshot-delete`: periodically remove and purge all kinds of snapshots that exceed the retention count. + > **Note:** The `retain` value is independent of each recurring job. + > + > Using a volume with 2 recurring jobs as an example: + > - `snapshot` with retain value set to 5 + > - `snapshot-delete`: with retain value set to 2 + > + > Eventually, there will be 2 snapshots retained after a complete `snapshot-delete` task execution. + + - `filesystem-trim`: periodically trim filesystem to reclaim disk space + +- `cron`: Cron expression. It tells the execution time of the job. + +- `retain`: How many snapshots/backups Longhorn will retain for each volume job. It should be no less than 1. + +- `concurrency`: The number of jobs to run concurrently. It should be no less than 1. + +Optional parameters can be specified: + +- `groups`: Any groups that the job should belong to. Having `default` in groups will automatically schedule this recurring job to any volume with no recurring job. + +- `labels`: Any labels that should be applied to the backup or snapshot. + +## Add Recurring Jobs to the Default group + +Default recurring jobs can be set by tick the checkbox `default` using UI or adding `default` to the recurring job `groups`. + +Longhorn will automatically add a volume to the `default` group when the volume has no recurring job. + +## Delete Recurring Jobs + +Longhorn automatically removes Volume and PVC recurring job labels when a corresponding RecurringJob custom resource is deleted. However, if a recurring job label is added without an existing RecurringJob custom resource, Longhorn does not perform the cleanup process for that label. + +## Apply Recurring Job to Longhorn Volume + +### Using the Longhorn UI + +The recurring job can be assigned on the volume detail page. To navigate to the volume detail page, click **Volume** then click the name of the volume. + +## Using the `kubectl` command + +Add recurring job group: +``` +kubectl -n longhorn-system label volume/ recurring-job-group.longhorn.io/=enabled + +# Example: +# kubectl -n longhorn-system label volume/pvc-8b9cd514-4572-4eb2-836a-ed311e804d2f recurring-job-group.longhorn.io/default=enabled +``` + +Add recurring job: +``` +kubectl -n longhorn-system label volume/ recurring-job.longhorn.io/=enabled + +# Example: +# kubectl -n longhorn-system label volume/pvc-8b9cd514-4572-4eb2-836a-ed311e804d2f recurring-job.longhorn.io/backup=enabled +``` + +Remove recurring job: +``` +kubectl -n longhorn-system label volume/ - + +# Example: +# kubectl -n longhorn-system label volume/pvc-8b9cd514-4572-4eb2-836a-ed311e804d2f recurring-job.longhorn.io/backup- +``` + +## With PersistentVolumeClam Using the `kubectl` command + +By default, applying a recurring job to a Persistent Volume Claim (PVC) does not have any effect. You can enable or disable this feature using the recurring job source label. + +Once the PVC is labeled as the source, any recurring job labels added or removed from the PVC will be periodically synchronized by Longhorn to the associated Volume. +``` +kubectl -n label pvc/ recurring-job.longhorn.io/source=enabled + +# Example: +# kubectl -n default label pvc/sample recurring-job-group.longhorn.io/source=enabled +``` + +Add recurring job group: +``` +kubectl -n label pvc/ recurring-job-group.longhorn.io/=enabled + +# Example: +# kubectl -n default label pvc/sample recurring-job-group.longhorn.io/default=enabled +``` + +Add recurring job: +``` +kubectl -n label pvc/ recurring-job.longhorn.io/=enabled + +# Example: +# kubectl -n default label pvc/sample recurring-job.longhorn.io/backup=enabled +``` + +Remove recurring job: +``` +kubectl -n label pvc/ - + +# Example: +# kubectl -n default label pvc/sample recurring-job.longhorn.io/backup- +``` + +## With StorageClass parameters + +Recurring job assignment can be configured in the `recurringJobSelector` parameters in a StorageClass. + +Any future volumes created using this StorageClass will have those recurring jobs automatically assigned. + +The `recurringJobSelector` field should follow JSON format: +```yaml +kind: StorageClass +apiVersion: storage.k8s.io/v1 +metadata: + name: longhorn +provisioner: driver.longhorn.io +parameters: + numberOfReplicas: "3" + staleReplicaTimeout: "30" + fromBackup: "" + recurringJobSelector: '[ + { + "name":"snap", + "isGroup":true + }, + { + "name":"backup", + "isGroup":false + } + ]' +``` + +The following parameters should be specified for each recurring job selector: + +1. `name`: Name of an existing recurring job or an existing recurring job group. + +2. `isGroup`: is the name that belongs to a recurring job or recurring job group, either `true` or `false`. + + +## Allow Recurring Job While Volume Is Detached + +Longhorn provides the setting `allow-recurring-job-while-volume-detached` that allows you to do recurring backup even when a volume is detached. +You can find the setting in Longhorn UI. + +When the setting is enabled, Longhorn will automatically attach the volume and take a snapshot/backup when it is time to do a recurring snapshot/backup. + +Note that during the time the volume was attached automatically, the volume is not ready for the workload. Workload will have to wait until the recurring job finishes. diff --git a/content/docs/1.5.1/snapshots-and-backups/setup-a-snapshot.md b/content/docs/1.5.1/snapshots-and-backups/setup-a-snapshot.md new file mode 100644 index 000000000..d9d04fb0b --- /dev/null +++ b/content/docs/1.5.1/snapshots-and-backups/setup-a-snapshot.md @@ -0,0 +1,14 @@ +--- + title: Create a Snapshot + weight: 1 +--- + +A [snapshot](../../concepts/#24-snapshots) is the state of a Kubernetes Volume at any given point in time. + +To create a snapshot of an existing cluster, + +1. In the top navigation bar of the Longhorn UI, click **Volume.** +2. Click the name of the volume of which you want a snapshot. This leads to the volume detail page. +3. Click the **Take Snapshot** button + +Once the snapshot is created you'll see it in the list of snapshots for the volume prior to the Volume Head. \ No newline at end of file diff --git a/content/docs/1.5.1/snapshots-and-backups/setup-disaster-recovery-volumes.md b/content/docs/1.5.1/snapshots-and-backups/setup-disaster-recovery-volumes.md new file mode 100644 index 000000000..fabc2ea24 --- /dev/null +++ b/content/docs/1.5.1/snapshots-and-backups/setup-disaster-recovery-volumes.md @@ -0,0 +1,30 @@ +--- +title: Disaster Recovery Volumes +description: Help and potential gotchas associated with specific cloud providers. +weight: 4 +--- + +A **disaster recovery (DR) volume** is a special volume that is mainly intended to store data in a backup cluster in case the whole main cluster goes down. Disaster recovery volumes are used to increase the resiliency of Longhorn volumes. + +For a longer explanation of how DR volumes work, see the [concepts section.](../../concepts/#33-disaster-recovery-volumes) + +For disaster recovery volume, `Last Backup` indicates the most recent backup of its original backup volume. + +If the icon representing the disaster volume is gray, it means the volume is restoring the `Last Backup` and this volume cannot be activated. If the icon is blue, it means the volume has restored the `Last Backup`. + +## Creating DR Volumes {#creating} + +> **Prerequisites:** Set up two Kubernetes clusters. These will be called cluster A and cluster B. Install Longhorn on both clusters, and set the same backup target on both clusters. For help setting the backup target, refer to [this page.](../backup-and-restore/set-backup-target) + +1. In the cluster A, make sure the original volume X has a backup created or has recurring backups scheduled. +2. In backup page of cluster B, choose the backup volume X, then create disaster recovery volume Y. It's highly recommended to use the backup volume name as the disaster volume name. +3. Longhorn will automatically attach the DR volume Y to a random node. Then Longhorn will start polling for the last backup of volume X, and incrementally restore it to the volume Y. + +## Activating DR Volumes {#activating} + +Longhorn supports activating a disaster recovery (DR) volume under the following conditions: + +- The volume is healthy, indicating that all replicas are in a healthy state. +- When the global setting [`Allow Volume Creation with Degraded Availability`](../../references/settings/#allow-volume-creation-with-degraded-availability) is enabled, the volume is degraded, indicating some replicas are unhealthy. + +When the setting `Allow Volume Creation with Degraded Availability` is disabled, attempting to activate a degraded DR volume will cause the volume to become stuck in the attached state. However, after enabling the setting, the DR volume will be activated and converted into a normal volume, remaining in the detached state. diff --git a/content/docs/1.5.1/spdk/_index.md b/content/docs/1.5.1/spdk/_index.md new file mode 100644 index 000000000..57bd357d4 --- /dev/null +++ b/content/docs/1.5.1/spdk/_index.md @@ -0,0 +1,4 @@ +--- +title: V2 Data Engine (Preview Feature) +weight: 0 +--- diff --git a/content/docs/1.5.1/spdk/automatic-offline-replica-rebuilding.md b/content/docs/1.5.1/spdk/automatic-offline-replica-rebuilding.md new file mode 100644 index 000000000..2109383e3 --- /dev/null +++ b/content/docs/1.5.1/spdk/automatic-offline-replica-rebuilding.md @@ -0,0 +1,29 @@ +--- + title: Automatic Offline Replica Rebuilding + weight: 4 +--- + +## Introduction + +Currently, Longhorn does not support online replica rebuilding for volumes that use the V2 Data Engine. To overcome this limitation, an automatic offline replica rebuilding mechanism has been implemented. When a degraded volume is detached, this mechanism attaches the volume in maintenance mode, and initiates the rebuilding process. Once the rebuilding is completed, the volume is detached as per the user's expectation. + +## Settings + +### Global Settings + +- **offline-replica-rebuilding**
+ + This setting allows users to enable the offline replica rebuilding for volumes using V2 Data Engine. The value is `enabled` by default, and available options are: + + - **disabled** + - **enabled** + +### Per-Volume Settings + +Longhorn also supports the per-volume setting by configuring `Volume.Spec.OfflineReplicaRebuilding`. The value is `ignored` by default, so data integrity check is determined by the global setting `offline-replica-rebuilding`. `Volume.Spec.OfflineReplicaRebuilding` supports **ignored**, **disabled** and **enabled**. Each volume can have its offline replica rebuilding customized. + +## Notice + +During the offline replica rebuilding process, it is important to note that interruptions are possible. In the case where a volume, which is undergoing rebuilding, is about to be attached by an application, the offline replica rebuilding task will be cancelled to prioritize the high-priority task. This mechanism ensures that critical tasks take precedence over the rebuilding process. + + diff --git a/content/docs/1.5.1/spdk/features.md b/content/docs/1.5.1/spdk/features.md new file mode 100644 index 000000000..534ca4d54 --- /dev/null +++ b/content/docs/1.5.1/spdk/features.md @@ -0,0 +1,13 @@ +--- +title: Features +weight: 1 +--- + +- Volume lifecycle (creation, attachment, detachment and deletion) +- Degraded volume +- Offline replica rebuilding +- Block disk management +- Orphaned replica management + +In addition to the features mentioned above, additional functionalities such as replica number adjustment, online replica rebuilding, snapshot, backup, restore and so on will be introduced in future versions. + diff --git a/content/docs/1.5.1/spdk/performance-benchmark.md b/content/docs/1.5.1/spdk/performance-benchmark.md new file mode 100644 index 000000000..dc16bddb3 --- /dev/null +++ b/content/docs/1.5.1/spdk/performance-benchmark.md @@ -0,0 +1,46 @@ +--- + title: Performance Benchmark + weight: 5 +--- + +## Benchmarking Tool + +Utilize [kbench](https://github.com/yasker/kbench) as the benchmarking tool. + +## Baseline + +The baseline of the data disk was also measured using [rancher/local-path-provisioner](https://github.com/rancher/local-path-provisioner). + +## Equinix (m3.small.x86) + +- Machine: Japan/m3.small.x86 +- CPU: Intel(R) Xeon(R) E-2378G CPU @ 2.80GHz +- RAM: 64 GiB +- Kubernetes: v1.23.6+rke2r2 +- Nodes: 3 (each node is a master and also a worker) +- OS: Ubuntu 22.04 / 5.15.0-33-generic +- Storage: 1 SSD (Micron_5300_MTFD) +- Network throughput between nodes (tested by iperf over 60 seconds): 15.0 Gbits/sec + +{{< figure src="/img/diagrams/spdk/equinix-iops.svg" >}} + +{{< figure src="/img/diagrams/spdk/equinix-bw.svg" >}} + +{{< figure src="/img/diagrams/spdk/equinix-latency.svg" >}} + +# AWS EC2 (c5d.xlarge) + +- Machine: Tokyo/c5d.xlarge +- CPU: Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz +- RAM: 8 GiB +- Kubernetes: v1.25.10+rke2r1 +- Nodes: 3 (each node is a master and also a worker) +- OS: Ubuntu 22.04.2 LTS / 5.19.0-1025-aws +- Storage: 1 SSD (Amazon EC2 NVMe Instance Storage/Local NVMe Storage) +- Network throughput between nodes (tested by iperf over 60 seconds): 7.9 Gbits/sec + +{{< figure src="/img/diagrams/spdk/aws-c5d-xlarge-iops.svg" >}} + +{{< figure src="/img/diagrams/spdk/aws-c5d-xlarge-bw.svg" >}} + +{{< figure src="/img/diagrams/spdk/aws-c5d-xlarge-latency.svg" >}} diff --git a/content/docs/1.5.1/spdk/prerequisites.md b/content/docs/1.5.1/spdk/prerequisites.md new file mode 100644 index 000000000..d3ef045ba --- /dev/null +++ b/content/docs/1.5.1/spdk/prerequisites.md @@ -0,0 +1,41 @@ +--- +title: Prerequisites +weight: 2 +--- + +## Prerequisites + +Longhorn nodes must meet the following requirements: + +- x86-64 CPU with SSE4.2 instruction support + > **NOTICE** + > + > Currently, V2 Data Engine only supports `x86_64` platform. + +- Linux kernel + + 5.13 or later is required for NVMe over TCP support + + +- Linux kernel modules + - uio + - uio_pci_generic + - nvme-tcp + +- HugePage support + - 1 GiB of 2 MiB-sized pages + +## Notice + +### CPU + +When the V2 Data Engine is enabled, each instance-manager pod utilizes **1 CPU core**. This high CPU usage is attributed to the `spdk_tgt` process running within each instance-manager pod. The spdk_tgt process is responsible for handling input/output (IO) operations and requires intensive polling. As a result, it consumes 100% of a dedicated CPU core to efficiently manage and process the IO requests, ensuring optimal performance and responsiveness for storage operations. + +### Memory + +SPDK utilizes huge pages to enhance performance and minimize memory overhead. To enable the usage of huge pages, it is necessary to configure 2MiB-sized huge pages on each Longhorn node. Specifically, **512 pages (equivalent to a total of 1 GiB)** need to be available on each Longhorn node. + + +### Disk + +**Local NVMe disks** are highly recommended for optimal storage performance of volumes using V2 Data Engine. \ No newline at end of file diff --git a/content/docs/1.5.1/spdk/quick-start.md b/content/docs/1.5.1/spdk/quick-start.md new file mode 100644 index 000000000..dc45483ef --- /dev/null +++ b/content/docs/1.5.1/spdk/quick-start.md @@ -0,0 +1,288 @@ +--- + title: Quick Start + weight: 3 +--- + +**Table of Contents** +- [Prerequisites](#prerequisites) + - [Configure Kernel Modules and Huge Pages](#configure-kernel-modules-and-huge-pages) + - [Install NVMe Userspace Tool and Load `nvme-tcp` Kernel Module](#install-nvme-userspace-tool-and-load-nvme-tcp-kernel-module) + - [Load Kernel Modules Automatically on Boot](#load-kernel-modules-automatically-on-boot) + - [Restart `kubelet`](#restart-kubelet) + - [Check Environment](#check-environment) +- [Installation](#installation) + - [Install Longhorn System](#install-longhorn-system) + - [Enable V2 Data Engine](#enable-v2-data-engine) + - [CPU and Memory Usage](#cpu-and-memory-usage) + - [Add `block-type` Disks in Longhorn Nodes](#add-block-type-disks-in-longhorn-nodes) + - [Prepare disks](#prepare-disks) + - [Add disks to `node.longhorn.io`](#add-disks-to-nodelonghornio) +- [Application Deployment](#application-deployment) + - [Create a StorageClass](#create-a-storageclass) + - [Create Longhorn Volumes](#create-longhorn-volumes) + +--- + +Longhorn's V2 Data Engine harnesses the power of the Storage Performance Development Kit (SPDK) to elevate its overall performance. The integration significantly reduces I/O latency while simultaneously boosting IOPS and throughput. The enhancement provides a high-performance storage solution capable of meeting diverse workload demands. + +**V2 Data Engine is currently a PREVIEW feature and should NOT be utilized in a production environment.** At present, a volume with V2 Data Engine only supports + +- Volume lifecycle (creation, attachment, detachment and deletion) +- Degraded volume +- Offline replica rebuilding +- Block disk management +- Orphaned replica management + +In addition to the features mentioned above, additional functionalities such as replica number adjustment, online replica rebuilding, snapshot, backup, restore and so on will be introduced in future versions. + +This tutorial will guide you through the process of configuring the environment and create Kubernetes persistent storage resources of persistent volumes (PVs) and persistent volume claims (PVCs) that correspond to Longhorn volumes using V2 Data Engine. + +## Prerequisites + +### Configure Kernel Modules and Huge Pages + +For Debian and Ubuntu, please install Linux kernel extra modules before loading the kernel modules +``` +apt install -y linux-modules-extra-`uname -r` +``` + +We provide a manifest that helps you configure the kernel modules and huge pages automatically, making it easier to set up. +``` +kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/deploy/prerequisite/longhorn-spdk-setup.yaml +``` + +And also can check the log with the following command to see the installation result. +``` +Cloning into '/tmp/spdk'... +INFO: Requested 512 hugepages but 512 already allocated on node0 +SPDK environment is configured successfully +``` + +Or, you can install them manually by following these steps. +- Load the kernel modules on the each Longhorn node + ``` + modprobe uio + modprobe uio_pci_generic + ``` + +- Configure huge pages + SPDK utilizes huge pages to enhance performance and minimize memory overhead. To enable the usage of huge pages, it is necessary to configure 2MiB-sized huge pages on each Longhorn node. Specifically, 512 pages (equivalent to a total of 1 GiB) need to be available on each Longhorn node. To allocate the huge pages, run the following commands on each node. + ``` + echo 512 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages + ``` + + To make the change permanent, add the following line to the file /etc/sysctl.conf. + ``` + echo "vm.nr_hugepages=512" >> /etc/sysctl.conf + ``` + +### Install NVMe Userspace Tool and Load `nvme-tcp` Kernel Module + +> **NOTICE:** +> +> Make sure that the version of `nvme-cli` is equal to or greater than `1.12`. +> +> If the version of `nvme-cli` installed by the below steps is not equal to or greater than `1.12`., you will need to compile the utility from the [source codes](https://github.com/linux-nvme/nvme-cli) and install it on each Longhorn node by manual. +> +> Also, install the **uuid development library** before compiling to support the `show-hostnqn` subcommand. +> +> For SUSE/OpenSUSE you can install it use this command: +> ``` +> zypper install uuid-devel +> ``` +> +> For Debian and Ubuntu, use this command: +> ``` +> apt install uuid-dev +> ``` +> +> For RHEL, CentOS, and EKS with `EKS Kubernetes Worker AMI with AmazonLinux2 image`, use this command: +> ``` +> yum install uuid-devel +> ``` +> + +And also can check the log with the following command to see the installation result +``` +nvme-cli install successfully +``` + +We provide a manifest that helps you finish the deployment on each Longhorn node. +``` +kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/deploy/prerequisite/longhorn-nvme-cli-installation.yaml +``` + +Or, you can manually install them. +- Install nvme-cli on each node and make sure that the version of `nvme-cli` is equal to or greater than `1.12`. + + For SUSE/OpenSUSE you can install it use this command: + ``` + zypper install nvme-cli + ``` + + For Debian and Ubuntu, use this command: + ``` + apt install nvme-cli + ``` + + For RHEL, CentOS, and EKS with `EKS Kubernetes Worker AMI with AmazonLinux2 image`, use this command: + ``` + yum install nvme-cli + ``` + + To check the version of nvme-cli, execute the following command. + ``` + nvme version + ``` + +- Load `nvme-tcp` kernel module on the each Longhorn node + ``` + modprobe nvme-tcp + ``` + +### Load Kernel Modules Automatically on Boot + +Rather than manually loading kernel modules `uio`, `uio_pci_generic` and `nvme-tcp` each time after reboot, you can streamline the process by configuring automatic module loading during the boot sequence. For detailed instructions, please consult the manual provided by your operating system. + +Reference: +- [SUSE/OpenSUSE: Loading kernel modules automatically on boot](https://documentation.suse.com/sles/15-SP4/html/SLES-all/cha-mod.html#sec-mod-modprobe-d) +- [Ubuntu: Configure kernel modules to load at boot](https://manpages.ubuntu.com/manpages/jammy/man5/modules-load.d.5.html) +- [RHEL: Loading kernel modules automatically at system boot time](https://access.redhat.com/documentation/zh-tw/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/managing-kernel-modules_managing-monitoring-and-updating-the-kernel) + +### Restart `kubelet` + +After finishing the above steps, restart kubelet on each node. + +### Check Environment + +Make sure everything is correctly configured and installed by +``` +bash -c "$(curl -sfL https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/scripts/environment_check.sh)" -s -s +``` + +## Installation + +### Install Longhorn System + +Follow the steps in Quick Installation to install Longhorn system. + +### Enable V2 Data Engine + +Enable the V2 Data Engine by changing the `v2-data-engine` setting to `true` after installation. Following this, the instance-manager pods will be automatically restarted. + +Or, you can enable it in `Setting > General > V2 Data Engine`. + +### CPU and Memory Usage + +When the V2 Data Engine is enabled, each instance-manager pod utilizes 1 CPU core. This high CPU usage is attributed to the `spdk_tgt` process running within each instance-manager pod. The spdk_tgt process is responsible for handling input/output (IO) operations and requires intensive polling. As a result, it consumes 100% of a dedicated CPU core to efficiently manage and process the IO requests, ensuring optimal performance and responsiveness for storage operations. + +``` +NAME CPU(cores) MEMORY(bytes) +csi-attacher-6488f75fb4-48pnb 1m 19Mi +csi-attacher-6488f75fb4-94m6r 1m 16Mi +csi-attacher-6488f75fb4-zmwfm 1m 15Mi +csi-provisioner-6785d78459-6tps7 1m 18Mi +csi-provisioner-6785d78459-bj89g 1m 23Mi +csi-provisioner-6785d78459-c5dzt 1m 17Mi +csi-resizer-d9bb7b7fc-25m8b 1m 17Mi +csi-resizer-d9bb7b7fc-fncjf 1m 15Mi +csi-resizer-d9bb7b7fc-t5dw7 1m 17Mi +csi-snapshotter-5b89555c8f-76ptq 1m 15Mi +csi-snapshotter-5b89555c8f-7vgtv 1m 19Mi +csi-snapshotter-5b89555c8f-vkhd8 1m 17Mi +engine-image-ei-b907910b-5vp8h 12m 15Mi +engine-image-ei-b907910b-9krcz 17m 15Mi +instance-manager-b3735b3e6d0a9e27d1464f548bdda5ec 1000m 29Mi +instance-manager-cbe60909512c58798690f692b883e5a9 1001m 27Mi +longhorn-csi-plugin-qf9kt 1m 61Mi +longhorn-csi-plugin-zk6sm 1m 60Mi +longhorn-driver-deployer-7d46fd5945-8tfmk 1m 24Mi +longhorn-manager-nm925 6m 137Mi +longhorn-manager-np849 6m 126Mi +longhorn-ui-54df99bfc-2lc8w 0m 2Mi +longhorn-ui-54df99bfc-w6dts 0m 2Mi +``` + + +You can observe the utilization of allocated huge pages on each node by running the command `kubectl get node -o yaml`. +``` +# kubectl get node sles-pool1-07437316-4jw8f -o yaml +... + +status: + ... + allocatable: + cpu: "8" + ephemeral-storage: "203978054087" + hugepages-1Gi: "0" + hugepages-2Mi: 1Gi + memory: 31813168Ki + pods: "110" + capacity: + cpu: "8" + ephemeral-storage: 209681388Ki + hugepages-1Gi: "0" + hugepages-2Mi: 1Gi + memory: 32861744Ki + pods: "110" +... +``` + +### Add `block-type` Disks in Longhorn Nodes + +Unlike `filesystem-type` disks that are designed for legacy volumes, volumes using V2 Data Engine are persistent on `block-type` disks. Therefore, it is necessary to equip Longhorn nodes with `block-type` disks. + +#### Prepare disks + +If there are no additional disks available on the Longhorn nodes, you can create loop block devices to test the feature. To accomplish this, execute the following command on each Longhorn node to create a 10 GiB block device. +``` +dd if=/dev/zero of=blockfile bs=1M count=10240 +losetup -f blockfile +``` + +To display the path of the block device when running the command `losetup -f blockfile`, use the following command. +``` +losetup -j blockfile +``` + +#### Add disks to `node.longhorn.io` + +You can add the disk by navigating to the Node UI page and specify the `Disk Type` as `Block`. Next, provide the block device's path in the `Path` field. + +Or, edit the `node.longhorn.io` resource. +``` +kubectl -n longhorn-system edit node.longhorn.io +``` + +Add the disk to `Spec.Disks` +``` +: + allowScheduling: true + evictionRequested: false + path: /PATH/TO/BLOCK/DEVICE + storageReserved: 0 + tags: [] + diskType: block +``` + +Wait for a while, you will see the disk is displayed in the `Status.DiskStatus`. + +## Application Deployment + +After the installation and configuration, we can dyamically provision a Persistent Volume using V2 Data Engine as the following steps. + +### Create a StorageClass + +Use following command to create a StorageClass called `longhorn-spdk`. Set `parameters.backendStoreDriver` to `spdk` to utilize V2 Data Engine. +``` +kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/examples/v2/storageclass.yaml +``` + +### Create Longhorn Volumes + +Create a Pod that uses Longhorn volumes using V2 Data Engine by running this command: +``` +kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/examples/v2/pod_with_pvc.yaml +``` + +Or, if you are creating a volume on Longhorn UI, please specify the `Backend Data Engine` as `v2`. diff --git a/content/docs/1.5.1/spdk/troubleshooting.md b/content/docs/1.5.1/spdk/troubleshooting.md new file mode 100644 index 000000000..4b42f4a74 --- /dev/null +++ b/content/docs/1.5.1/spdk/troubleshooting.md @@ -0,0 +1,59 @@ +--- +title: Troubleshooting +weight: 6 +--- + +- [Installation](#installation) + - ["Package 'linux-modules-extra-x.x.x-x-generic' Has No Installation Candidate" Error During Installation on Debian Machines](#package-linux-modules-extra-xxx-x-generic-has-no-installation-candidate-error-during-installation-on-debian-machines) +- [Disk](#disk) + - ["Invalid argument" Error in Disk Status After Adding a Block-Type Disk](#invalid-argument-error-in-disk-status-after-adding-a-block-type-disk) + +--- + +## Installation + +### "Package 'linux-modules-extra-x.x.x-x-generic' Has No Installation Candidate" Error During Installation on Debian Machines + +For Debian machines, if you encounter errors similar to the below when installing Linux kernel extra modules, you need to find an available version in the pkg collection websites like [this](https://pkgs.org/search/?q=linux-modules-extra) rather than directly relying on `uname -r` instead: +```log +apt install -y linux-modules-extra-`uname -r` +Reading package lists... Done +Building dependency tree... Done +Reading state information... Done +Package linux-modules-extra-5.15.0-67-generic is not available, but is referred to by another package. +This may mean that the package is missing, has been obsoleted, or +is only available from another source + +E: Package 'linux-modules-extra-5.15.0-67-generic' has no installation candidate +``` + +For example, for Ubuntu 22.04, one valid version is `linux-modules-extra-5.15.0-76-generic`: +```shell +apt update -y +apt install -y linux-modules-extra-5.15.0-76-generic +``` + +## Disk + +### "Invalid argument" Error in Disk Status After Adding a Block-Type Disk + +After adding a block-type disk, the disk status displays error messages: +``` +Disk disk-1(/dev/nvme1n1) on node dereksu-ubuntu-pool1-bf77ed93-2d2p9 is not ready: +failed to generate disk config: error: rpc error: code = Internal desc = rpc error: code = Internal +desc = failed to add block device: failed to create AIO bdev: error sending message, id 10441, +method bdev_aio_create, params {disk-1 /host/dev/nvme1n1 4096}: {"code": -22,"message": "Invalid argument"} +``` + +Next, inspect the log message of the instance-manager pod on the same node. If the log reveals the following: +``` +[2023-06-29 08:51:53.762597] bdev_aio.c: 762:create_aio_bdev: *WARNING*: Specified block size 4096 does not match auto-detected block size 512 +[2023-06-29 08:51:53.762640] bdev_aio.c: 788:create_aio_bdev: *ERROR*: Disk size 100000000000 is not a multiple of block size 4096 +``` +These messages indicate that the size of your disk is not a multiple of the block size 4096 and is not supported by Longhorn system. + +To resolve this issue, you can follow the steps +1. Remove the newly added block-type disk from the node. +2. Partition the block-type disk using the `fdisk` utility and ensure that the partition size is a multiple of the block size 4096. +3. Add the partitioned disk to the Longhorn node. + diff --git a/content/docs/1.5.1/terminology.md b/content/docs/1.5.1/terminology.md new file mode 100644 index 000000000..adf37ea9d --- /dev/null +++ b/content/docs/1.5.1/terminology.md @@ -0,0 +1,213 @@ +--- +title: Terminology +weight: 4 +--- + +- [Attach/Reattach](#attachreattach) +- [Backup](#backup) +- [Backupstore](#backupstore) +- [Backup target](#backup-target) +- [Backup volume](#backup-volume) +- [Block storage](#block-storage) +- [CRD](#crd) +- [CSI Driver](#csi-driver) +- [Disaster Recovery Volumes (DR volume)](#disaster-recovery-volumes-dr-volume) +- [ext4](#ext4) +- [Frontend expansion](#frontend-expansion) +- [Instance Manager](#instance-manager) +- [Longhorn volume](#longhorn-volume) +- [Mount](#mount) +- [NFS](#nfs) +- [Object storage](#object-storage) +- [Offline expansion](#offline-expansion) +- [Overprovisioning](#overprovisioning) +- [PersistentVolume](#persistentvolume) +- [PersistentVolumeClaim](#persistentvolumeclaim) +- [Primary backups](#primary-backups) +- [Remount](#remount) +- [Replica](#replica) +- [S3](#s3) +- [Salvage a volume](#salvage-a-volume) +- [Secondary backups](#secondary-backups) +- [Snapshot](#snapshot) +- [Stable identity](#stable-identity) +- [StatefulSet](#statefulset) +- [StorageClass](#storageclass) +- [System Backup](#system-backup) +- [Thin provisioning](#thin-provisioning) +- [Umount](#umount) +- [Volume (Kubernetes concept)](#volume-kubernetes-concept) +- [XFS](#xfs) +- [SMB/CIFS](#smbcifs) + +### Attach/Reattach + +To attach a block device is to make it appear on the Linux node, e.g. `/dev/longhorn/testvol` + +If the volume engine dies unexpectedly, Longhorn will reattach the volume. + +### Backup + +A backup is an object in the backupstore. The backupstore may contain volume backups and system backups. + +### Backupstore + +Longhorn backups are saved to the backupstore, which is external to the Kubernetes cluster. The backupstore can be either NFS shares or an S3 compatible server. + +Longhorn accesses the backupstore at the endpoint configured in the backuptarget. + +### Backup target + +A backup target is the endpoint used to access a backupstore in Longhorn. + +### Backup volume + +A backup volume is the backup that maps to one original volume, and it is located in the backupstore. Backup volumes can be viewed on the **Backup** page in the Longhorn UI. The backup volume will contain multiple backups for the same volume. + +Backups can be created from snapshots. They contain the state of the volume at the time the snapshot was created, but they don't contain snapshots, so they do not contain the history of changes to the volume data. While backups are made of 2 MB files, snapshots can be terabytes. + +Backups are made of 2 MB blocks in an object store. + +For a longer explanation of how snapshots and backups work, refer to the [conceptual documentation.](../concepts/#241-how-snapshots-work) + +### Block storage + +An approach to storage in which data stored in fixed-size blocks. Each block is distinguished based on a memory address. + +### CRD + +A Kubernetes [custom resource definition.](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) + +### CSI Driver + +The Longhorn CSI Driver is a [container storage interface](https://kubernetes-csi.github.io/docs/drivers.html) that can be used with Kubernetes. The CSI driver for Longhorn volumes is named `driver.longhorn.io`. + +### Disaster Recovery Volumes (DR volume) + +A DR volume is a special volume that stores data in a backup cluster in case the whole main cluster goes down. DR volumes are used to increase the resiliency of Longhorn volumes. + +Each backup volume in the backupstore maps to one original volume in the Kubernetes cluster. Likewise, each DR volume maps to a backup volume in the backupstore. + +DR volumes can be created to accurately reflect backups of a Longhorn volume, but they cannot be used as a normal Longhorn volume until they are activated. + +### ext4 + +A file system for Linux. Longhorn supports ext4 for storage. + +### Frontend expansion + +The frontend here is referring to the block device exposed by the Longhorn volume. + +### Instance Manager + +The Longhorn component for controller/replica instance lifecycle management. + +### Longhorn volume + +A Longhorn volume is a Kubernetes volume that is replicated and managed by the Longhorn Manager. For each volume, the Longhorn Manager also creates: + +- An instance of the Longhorn Engine +- Replicas of the volume, where each replica consists of a series of snapshots of the volume + +Each replica contains a chain of snapshots, which record the changes in the volume's history. Three replicas are created by default, and they are usually stored on separate nodes for high availability. + +### Mount + +A Linux command to mount the block device to a certain directory on the node, e.g. `mount /dev/longhorn/testvol /mnt` + +### NFS + +A [distributed file system protocol](https://en.wikipedia.org/wiki/Network_File_System) that allows you to access files over a computer network, similar to the way that local storage is accessed. Longhorn supports using NFS as a backupstore for secondary storage. + +### Object storage + +Data storage architecture that manages data as objects. Each object typically includes the data itself, a variable amount of metadata, and a globally unique identifier. Longhorn volumes can be backed up to S3 compatible object storage. + +### Offline expansion + +In an offline volume expansion, the volume is detached. + +### Overprovisioning + +Overprovisioning allows a server to view more storage capacity than has been physically reserved. That means we can schedule a total of 750 GiB Longhorn volumes on a 200 GiB disk with 50G reserved for the root file system. The **Storage Over Provisioning Percentage** can be configured in the Longhorn [settings.](../references/settings) + +### PersistentVolume + +A PersistentVolume (PV) is a Kubernetes resource that represents piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using Storage Classes. It is a cluster-level resource, and is required for pods to use persistent storage that is independent of the lifecycle of any individual pod. For more information, see the official [Kubernetes documentation about persistent volumes.](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) + +### PersistentVolumeClaim + +A PersistentVolumeClaim (PVC) is a request for storage by a user. Pods can request specific levels of resources (CPU and Memory) by using a PVC for storage. Claims can request specific sizes and access modes (e.g., they can be mounted once read/write or many times read-only). + +For more information, see the official [Kubernetes documentation.](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) + +### Primary backups + +The replicas of each Longhorn volume on a Kubernetes cluster can be considered primary backups. + +### Remount + +In a remount, Longhorn will detect and mount the filesystem for the volume after the reattachment. + +### Replica + +A replica consists of a chain of snapshots, showing a history of the changes in the data within a volume. + +### S3 + +[Amazon S3](https://aws.amazon.com/s3/) is an object storage service. + +### Salvage a volume + +The salvage operation is needed when all replicas become faulty, e.g. due to a network disconnection. + +When salvaging a volume, Longhorn will try to figure out which replica(s) are usable, then use them to recover the volume. + +### Secondary backups + +Backups external to the Kubernetes cluster, on S3 or NFS. + +### Snapshot + +A snapshot in Longhorn captures the state of a volume at the time the snapshot is created. Each snapshot only captures changes that overwrite data from earlier snapshots, so a sequence of snapshots is needed to fully represent the full state of the volume. Volumes can be restored from a snapshot. For a longer explanation of snapshots, refer to the [conceptual documentation.](../concepts) + +### Stable identity + +[StatefulSets](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) have a stable identity, which means that Kubernetes won't force delete the Pod for the user. + +### StatefulSet + +A [Kubernetes resource](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) used for managing stateful applications. + +### StorageClass + +A Kubernetes resource that can be used to automatically provision a PersistentVolume for a pod. For more information, refer to the [Kubernetes documentation.](https://kubernetes.io/docs/concepts/storage/storage-classes/#the-storageclass-resource) + +### System Backup + +Longhorn uploads the system backup to the backupstore. Each system backup contains the system backup resource bundle of the Longhorn system. + +See [Longhorn System Backup Bundle](../advanced-resources/system-backup-restore/backup-longhorn-system/#longhorn-system-backup-bundle) for details. + +### Thin provisioning + +Longhorn is a thin-provisioned storage system. That means a Longhorn volume will only take the space it needs at the moment. For example, if you allocated a 20 GB volume but only use 1 GB of it, the actual data size on your disk would be 1GB.  + +### Umount + +A [Linux command](https://linux.die.net/man/8/umount) that detaches the file system from the file hierarchy. + +### Volume (Kubernetes concept) + +A volume in Kubernetes allows a pod to store files during the lifetime of the pod. + +These files will still be available after a container crashes, but they will not be available past the lifetime of a pod. To get storage that is still available after the lifetime of a pod, a Kubernetes [PersistentVolume (PV)](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistent-volumes) is required. + +For more information, see the Kubernetes documentation on [volumes.](https://kubernetes.io/docs/concepts/storage/volumes/) + +### XFS +A [file system](https://en.wikipedia.org/wiki/XFS) supported by most Linux distributions. Longhorn supports XFS for storage. + +### SMB/CIFS + +A [network file system protocol](https://en.wikipedia.org/wiki/Network_File_System) that allows you to access files over a computer network, similar to the way that local storage is accessed. Longhorn supports using SMB/CIFS as a backupstore for secondary storage. \ No newline at end of file diff --git a/content/docs/1.5.1/volumes-and-nodes/_index.md b/content/docs/1.5.1/volumes-and-nodes/_index.md new file mode 100644 index 000000000..ac66a8716 --- /dev/null +++ b/content/docs/1.5.1/volumes-and-nodes/_index.md @@ -0,0 +1,4 @@ +--- +title: Volumes and Nodes +weight: 2 +--- \ No newline at end of file diff --git a/content/docs/1.5.1/volumes-and-nodes/create-volumes.md b/content/docs/1.5.1/volumes-and-nodes/create-volumes.md new file mode 100644 index 000000000..76b07a324 --- /dev/null +++ b/content/docs/1.5.1/volumes-and-nodes/create-volumes.md @@ -0,0 +1,165 @@ +--- +title: Create Longhorn Volumes +weight: 1 +--- + +In this tutorial, you'll learn how to create Kubernetes persistent storage resources of persistent volumes (PVs) and persistent volume claims (PVCs) that correspond to Longhorn volumes. You will use kubectl to dynamically provision storage for workloads using a Longhorn storage class. For help creating volumes from the Longhorn UI, refer to [this section.](#creating-longhorn-volumes-with-the-longhorn-ui) + +> This section assumes that you understand how Kubernetes persistent storage works. For more information, see the [Kubernetes documentation.](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) + +### Creating Longhorn Volumes with kubectl + +First, you will create a Longhorn StorageClass. The Longhorn StorageClass contains the parameters to provision PVs. + +Next, a PersistentVolumeClaim is created that references the StorageClass. Finally, the PersistentVolumeClaim is mounted as a volume within a Pod. + +When the Pod is deployed, the Kubernetes master will check the PersistentVolumeClaim to make sure the resource request can be fulfilled. If storage is available, the Kubernetes master will create the Longhorn volume and bind it to the Pod. + +1. Use following command to create a StorageClass called `longhorn`: + + ``` + kubectl create -f https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/examples/storageclass.yaml + ``` + + The following example StorageClass is created: + + ``` + kind: StorageClass + apiVersion: storage.k8s.io/v1 + metadata: + name: longhorn + provisioner: driver.longhorn.io + allowVolumeExpansion: true + parameters: + numberOfReplicas: "3" + staleReplicaTimeout: "2880" # 48 hours in minutes + fromBackup: "" + fsType: "ext4" + # mkfsParams: "-I 256 -b 4096 -O ^metadata_csum,^64bit" + # diskSelector: "ssd,fast" + # nodeSelector: "storage,fast" + # recurringJobSelector: '[ + # { + # "name":"snap", + # "isGroup":true, + # }, + # { + # "name":"backup", + # "isGroup":false, + # } + # ]' + ``` + + In particular, starting with v1.4.0, the parameter `mkfsParams` can be used to specify filesystem format options for each StorageClass. + +2. Create a Pod that uses Longhorn volumes by running this command: + + ``` + kubectl create -f https://raw.githubusercontent.com/longhorn/longhorn/v{{< current-version >}}/examples/pod_with_pvc.yaml + ``` + + A Pod named `volume-test` is launched, along with a PersistentVolumeClaim named `longhorn-volv-pvc`. The PersistentVolumeClaim references the Longhorn StorageClass: + + ``` + apiVersion: v1 + kind: PersistentVolumeClaim + metadata: + name: longhorn-volv-pvc + spec: + accessModes: + - ReadWriteOnce + storageClassName: longhorn + resources: + requests: + storage: 2Gi + ``` + + The persistentVolumeClaim is mounted in the Pod as a volume: + + ``` + apiVersion: v1 + kind: Pod + metadata: + name: volume-test + namespace: default + spec: + containers: + - name: volume-test + image: nginx:stable-alpine + imagePullPolicy: IfNotPresent + volumeMounts: + - name: volv + mountPath: /data + ports: + - containerPort: 80 + volumes: + - name: volv + persistentVolumeClaim: + claimName: longhorn-volv-pvc + ``` +More examples are available [here.](../../references/examples) + +### Binding Workloads to PVs without a Kubernetes StorageClass + +It is possible to use a Longhorn StorageClass to bind a workload to a PV without creating a StorageClass object in Kubernetes. + +Since the Storage Class is also a field used to match a PVC with a PV, which doesn't have to be created by a Provisioner, you can create a PV manually with a custom StorageClass name, then create a PVC asking for the same StorageClass name. + +When a PVC requests a StorageClass that does not exist as a Kubernetes resource, Kubernetes will try to bind your PVC to a PV with the same StorageClass name. The StorageClass will be used like a label to find the matching PV, and only existing PVs labeled with the StorageClass name will be used. + +If the PVC names a StorageClass, Kubernetes will: + +1. Look for an existing PV that has the label matching the StorageClass +2. Look for an existing StorageClass Kubernetes resource. If the StorageClass exists, it will be used to create a PV. + +### Creating Longhorn Volumes with the Longhorn UI + +Since the Longhorn volume already exists while creating PV/PVC, a StorageClass is not needed for dynamically provisioning Longhorn volume. However, the field `storageClassName` should be set in PVC/PV, to be used for PVC bounding purpose. And it's unnecessary for users to create the related StorageClass object. + +By default the StorageClass for Longhorn created PV/PVC is `longhorn-static`. Users can modify it in `Setting - General - Default Longhorn Static StorageClass Name` as they need. + +Users need to manually delete PVC and PV created by Longhorn. + + +### PV/PVC Creation for Existing Longhorn Volume + +Now users can create PV/PVC via our Longhorn UI for the existing Longhorn volumes. +Only detached volume can be used by a newly created pod. + +### The Failure of the Longhorn Volume Creation + +Creating a Longhorn volume will fail if there are no available nodes, disks, or insufficient storage. The failures are categorized into: +- insufficient storage, +- disk not found, +- disks are unavailable, +- failed to retrieve scheduling settings failed to retrieve, +- tags not fulfilled, +- node not found, +- nodes are unavailable, +- none of the node candidates contains a ready engine image, +- hard affinity cannot be satisfied, +- replica scheduling failed. + +The failure results in the workload failing to use the provisioned PV and showing a warning message +``` +# kubectl describe pod workload-test + +Events: + Type Reason Age From Message + ---- ------ ---- ---- ------- + Warning FailedAttachVolume 14s (x8 over 82s) attachdetach-controller AttachVolume.Attach + failed for volume "pvc-e130e369-274d-472d-98d1-f6074d2725e8" : rpc error: code = Aborted + desc = volume pvc-e130e369-274d-472d-98d1-f6074d2725e8 is not ready for workloads +``` + +In order to help users understand the error causes, Longhorn summarizes them in the PV annotation, `longhorn.io/volume-scheduling-error`. Failures are combined in this annotation and separated by a semicolon, for example, `longhorn.io/volume-scheduling-error: insufficient storage;disks are unavailable`. The annotation can be checked by using `kubectl describe pv `. +``` +# kubectl describe pv pvc-e130e369-274d-472d-98d1-f6074d2725e8 +Name: pvc-e130e369-274d-472d-98d1-f6074d2725e8 +Labels: +Annotations: longhorn.io/volume-scheduling-error: insufficient storage + pv.kubernetes.io/provisioned-by: driver.longhorn.io + +... + +``` \ No newline at end of file diff --git a/content/docs/1.5.1/volumes-and-nodes/delete-volumes.md b/content/docs/1.5.1/volumes-and-nodes/delete-volumes.md new file mode 100644 index 000000000..d354e773b --- /dev/null +++ b/content/docs/1.5.1/volumes-and-nodes/delete-volumes.md @@ -0,0 +1,19 @@ +--- +title: Delete Longhorn Volumes +weight: 1 +--- +Once you are done utilizing a Longhorn volume for storage, there are a number of ways to delete the volume, depending on how you used the volume. + +## Deleting Volumes Through Kubernetes +> **Note:** This method only works if the volume was provisioned by a StorageClass and the PersistentVolume for the Longhorn volume has its Reclaim Policy set to Delete. + +You can delete a volume through Kubernetes by deleting the PersistentVolumeClaim that uses the provisioned Longhorn volume. This will cause Kubernetes to clean up the PersistentVolume and then delete the volume in Longhorn. + +## Deleting Volumes Through Longhorn +All Longhorn volumes, regardless of how they were created, can be deleted through the Longhorn UI. + +To delete a single volume, go to the Volume page in the UI. Under the Operation dropdown, select Delete. You will be prompted with a confirmation before deleting the volume. + +To delete multiple volumes at the same time, you can check multiple volumes on the Volume page and select Delete at the top. + +> **Note:** If Longhorn detects that a volume is tied to a PersistentVolume or PersistentVolumeClaim, then these resources will also be deleted once you delete the volume. You will be warned in the UI about this before proceeding with deletion. Longhorn will also warn you when deleting an attached volume, since it may be in use. diff --git a/content/docs/1.5.1/volumes-and-nodes/detaching-volumes.md b/content/docs/1.5.1/volumes-and-nodes/detaching-volumes.md new file mode 100644 index 000000000..e83917d63 --- /dev/null +++ b/content/docs/1.5.1/volumes-and-nodes/detaching-volumes.md @@ -0,0 +1,54 @@ +--- +title: Detaching Volumes +weight: 8 +--- + +Shut down all Kubernetes Pods using Longhorn volumes in order to detach the volumes. The easiest way to achieve this is by deleting all workloads and recreate them later after upgrade. If this is not desirable, some workloads may be suspended. + +In this section, you'll learn how each workload can be modified to shut down its pods. + +#### Deployment +Edit the deployment with `kubectl edit deploy/`. + +Set `.spec.replicas` to `0`. + +#### StatefulSet +Edit the statefulset with `kubectl edit statefulset/`. + +Set `.spec.replicas` to `0`. + +#### DaemonSet +There is no way to suspend this workload. + +Delete the daemonset with `kubectl delete ds/`. + +#### Pod +Delete the pod with `kubectl delete pod/`. + +There is no way to suspend a pod not managed by a workload controller. + +#### CronJob +Edit the cronjob with `kubectl edit cronjob/`. + +Set `.spec.suspend` to `true`. + +Wait for any currently executing jobs to complete, or terminate them by deleting relevant pods. + +#### Job +Consider allowing the single-run job to complete. + +Otherwise, delete the job with `kubectl delete job/`. + +#### ReplicaSet +Edit the replicaset with `kubectl edit replicaset/`. + +Set `.spec.replicas` to `0`. + +#### ReplicationController +Edit the replicationcontroller with `kubectl edit rc/`. + +Set `.spec.replicas` to `0`. + +Wait for the volumes using by the Kubernetes to complete detaching. + +Then detach all remaining volumes from Longhorn UI. These volumes were most likely created and attached outside of Kubernetes via Longhorn UI or REST API. \ No newline at end of file diff --git a/content/docs/1.5.1/volumes-and-nodes/disks-or-nodes-eviction.md b/content/docs/1.5.1/volumes-and-nodes/disks-or-nodes-eviction.md new file mode 100644 index 000000000..a4de7caad --- /dev/null +++ b/content/docs/1.5.1/volumes-and-nodes/disks-or-nodes-eviction.md @@ -0,0 +1,38 @@ +--- +title: Evicting Replicas on Disabled Disks or Nodes +weight: 5 +--- + +Longhorn supports auto eviction for evicting the replicas on the selected disabled disks or nodes to other suitable disks and nodes. Meanwhile the same level of high availability is maintained during the eviction. + +> **Note:** This eviction feature can only be enabled when the selected disks or nodes have scheduling disabled. And during the eviction time, the selected disks or nodes cannot be re-enabled for scheduling. + +> **Note:** This eviction feature works for volumes that are `Attached` and `Detached`. If the volume is 'Detached', Longhorn will automatically attach it before the eviction and automatically detach it once eviction is done. + +By default, `Eviction Requested` for disks or nodes is `false`. And to keep the same level of high availability during the eviction, Longhorn only evicts a replica per volume after the replica rebuild for this volume is a success. + +## Select Disks or Nodes for Eviction + +To evict disks for a node, + +1. Head to the `Node` tab, select one of the nodes, and select `Edit Node and Disks` in the dropdown menu. +1. Make sure the disk is disabled for scheduling and set `Scheduling` to `Disable`. +2. Set `Eviction Requested` to `true` and save. + +To evict a node, + +1. Head to the `Node` tab, select one or more nodes, and click `Edit Node`. +1. Make sure the node is disabled for scheduling and set `Scheduling` to `Disable`. +2. Set `Eviction Requested` to `true`, and save. + +## Cancel Disks or Nodes Eviction + +To cancel the eviction for a disk or a node, set the corresponding `Eviction Requested` setting to `false`. + +## Check Eviction Status + +The `Replicas` number on the selected disks or nodes should be reduced to 0 once the eviction is a success. + +If you click on the `Replicas` number, it will show the replica name on this disk. When you click on the replica name, the Longhorn UI will redirect the webpage to the corresponding volume page, and it will display the volume status. If there is any error, e.g. no space, or couldn't find another schedulable disk (schedule failure), the error will be shown. All of the errors will be logged in the Event log. + +If any error happened during the eviction, the eviction will be suspended until new space has been cleared or it will be cancelled. And if the eviction is cancelled, the remaining replicas on the selected disks or nodes will remain on the disks or nodes. diff --git a/content/docs/1.5.1/volumes-and-nodes/expansion.md b/content/docs/1.5.1/volumes-and-nodes/expansion.md new file mode 100644 index 000000000..825c659ad --- /dev/null +++ b/content/docs/1.5.1/volumes-and-nodes/expansion.md @@ -0,0 +1,133 @@ +--- + title: Volume Expansion + weight: 4 +--- + +Volumes are expanded in two stages. First, Longhorn resizes the block device, then it expands the filesystem. + +Since v1.4.0, Longhorn supports online expansion. Most of the time Longhorn can directly expand an attached volumes without limitations, no matter if the volume is being R/W or rebuilding. + +If the volume was not expanded though the CSI interface (e.g. for Kubernetes older than v1.16), the capacity of the corresponding PVC and PV won't change. + +## Prerequisite + +- For offline expansion, the Longhorn version must be v0.8.0 or higher. +- For online expansion, the Longhorn version must be v1.4.0 or higher. + +## Expand a Longhorn volume + +There are two ways to expand a Longhorn volume: with a PersistentVolumeClaim (PVC) and with the Longhorn UI. + +#### Via PVC + +This method is applied only if: + +- The PVC is dynamically provisioned by the Kubernetes with Longhorn StorageClass. +- The field `allowVolumeExpansion` should be `true` in the related StorageClass. + +This method is recommended if it's applicable, because the PVC and PV will be updated automatically and everything is kept consistent after expansion. + +Usage: Find the corresponding PVC for Longhorn volume, then modify the requested `spec.resources.requests.storage` of the PVC: + +``` +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + annotations: + kubectl.kubernetes.io/last-applied-configuration: | + {"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{},"name":"longhorn-simple-pvc","namespace":"default"},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"1Gi"}},"storageClassName":"longhorn"}} + pv.kubernetes.io/bind-completed: "yes" + pv.kubernetes.io/bound-by-controller: "yes" + volume.beta.kubernetes.io/storage-provisioner: driver.longhorn.io + creationTimestamp: "2019-12-21T01:36:16Z" + finalizers: + - kubernetes.io/pvc-protection + name: longhorn-simple-pvc + namespace: default + resourceVersion: "162431" + selfLink: /api/v1/namespaces/default/persistentvolumeclaims/longhorn-simple-pvc + uid: 0467ae73-22a5-4eba-803e-464cc0b9d975 +spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 1Gi + storageClassName: longhorn + volumeMode: Filesystem + volumeName: pvc-0467ae73-22a5-4eba-803e-464cc0b9d975 +status: + accessModes: + - ReadWriteOnce + capacity: + storage: 1Gi + phase: Bound +``` + +#### Via Longhorn UI + +Usage: On the volume page of Longhorn UI, click `Expand` for the volume. + +## Filesystem expansion + +Longhorn will try to expand the file system only if: + +- The expanded size should be greater than the current size. +- There is a Linux filesystem in the Longhorn volume. +- The filesystem used in the Longhorn volume is one of the following: + - ext4 + - xfs +- The Longhorn volume is using the block device frontend. + +## Corner cases + +#### Handling Volume Revert + +If a volume is reverted to a snapshot with smaller size, the frontend of the volume is still holding the expanded size. But the filesystem size will be the same as that of the reverted snapshot. In this case, you will need to handle the filesystem manually: + +1. Attach the volume to a random node. +2. Log in to the corresponding node, and expand the filesystem. + + If the filesystem is `ext4`, the volume might need to be [mounted](https://linux.die.net/man/8/mount) and [umounted](https://linux.die.net/man/8/umount) once before resizing the filesystem manually. Otherwise, executing `resize2fs` might result in an error: + + ``` + resize2fs: Superblock checksum does not match superblock while trying to open ...... + Couldn't find valid filesystem superblock. + ``` + + Follow the steps below to resize the filesystem: + + ``` + mount /dev/longhorn/ + umount /dev/longhorn/ + mount /dev/longhorn/ + resize2fs /dev/longhorn/ + umount /dev/longhorn/ + ``` + +3. If the filesystem is `xfs`, you can directly mount, then expand the filesystem. + + ``` + mount /dev/longhorn/ + xfs_growfs + umount /dev/longhorn/ + ``` + +#### Encrypted volume + +Due to [the upstream limitation](https://kubernetes.io/blog/2022/09/21/kubernetes-1-25-use-secrets-while-expanding-csi-volumes-on-node-alpha/), Longhorn cannot handle **online** expansion for encrypted volumes automatically unless you enable the feature gate `CSINodeExpandSecret`. + +If you cannot enable it but still prefer to do online expansion, you can: +1. Login the node host the encrypted volume is attached to. +2. Execute `cryptsetup resize `. The passphrase this command requires is the field `CRYPTO_KEY_VALUE` of the corresponding secret. +3. Expand the filesystem. + +#### RWX volume + +Currently, Longhorn is unable to expand the filesystem (NFS) for RWX volumes. - If you decide to expand a RWX volume manually, you can: + +1. Expand the block device of the RWX volume via PVC or UI. +2. Figure out the share manager pod of the RWX volume then execute the filesystem expansion command. The share manager pod is typically named as `share-manager-`. + ```shell + kubectl -n longhorn-system exec -it -- resize2fs /dev/longhorn/ + ``` diff --git a/content/docs/1.5.1/volumes-and-nodes/maintenance.md b/content/docs/1.5.1/volumes-and-nodes/maintenance.md new file mode 100644 index 000000000..79b804228 --- /dev/null +++ b/content/docs/1.5.1/volumes-and-nodes/maintenance.md @@ -0,0 +1,102 @@ +--- +title: Node Maintenance and Kubernetes Upgrade Guide +weight: 6 +--- + +This section describes how to handle planned node maintenance or upgrading Kubernetes version for the cluster. + +- [Updating the Node OS or Container Runtime](#updating-the-node-os-or-container-runtime) +- [Removing a Disk](#removing-a-disk) + - [Reusing the Node Name](#reusing-the-node-name) +- [Removing a Node](#removing-a-node) +- [Upgrading Kubernetes](#upgrading-kubernetes) + - [In-place Upgrade](#in-place-upgrade) + - [Managed Kubernetes](#managed-kubernetes) + +## Updating the Node OS or Container Runtime + +1. Cordon the node. Longhorn will automatically disable the node scheduling when a Kubernetes node is cordoned. + +1. Drain the node to move the workload to somewhere else. + + You will need to use `--ignore-daemonsets` to drain the node. + The `--ignore-daemonsets` is needed because Longhorn deployed some daemonsets such as `Longhorn manager`, `Longhorn CSI plugin`, `engine image`. + + The running replicas on the node will be stopped at this stage. They will be shown as `Failed`. + + > **Note:** + > By default, if there is one last healthy replica for a volume on + > the node, Longhorn will prevent the node from completing the drain + > operation, to protect the last replica and prevent the disruption of the + > workload. You can control this behavior in the setting [Node Drain Policy](../../references/settings#node-drain-policy), or [evict + > the replica to other nodes before draining](../disks-or-nodes-eviction). + + The engine processes on the node will be migrated with the Pod to other nodes. + > **Note:** For volumes that are not attached through the CSI flow on the node (for example, manually attached using UI), + > they will not be automatically attached to new nodes by Kubernetes during the draining. + > Therefore, Longhorn will prevent the node from completing the drain operation. + > User would need to handle detachment for these volumes to unblock the draining. + + After the `drain` is completed, there should be no engine or replica process running on the node. Two instance managers will still be running on the node, but they're stateless and won't cause interruption to the existing workload. + + > **Note:** Normally you don't need to evict the replicas before the drain + > operation, as long as you have healthy replicas on other nodes. The replicas + > can be reused later, once the node back online and uncordoned. + +1. Perform the necessary maintenance, including shutting down or rebooting the node. +1. Uncordon the node. Longhorn will automatically re-enable the node scheduling. + If there are existing replicas on the node, Longhorn might use those + replicas to speed up the rebuilding process. You can set the [Replica + Replenishment Wait Interval](../../references/settings#replica-replenishment-wait-interval) setting to customize how long Longhorn should + wait for potentially reusable replica to be available. + +## Removing a Disk +To remove a disk: +1. Disable the disk scheduling. +1. Evict all the replicas on the disk. +1. Delete the disk. + +### Reusing the Node Name + +These steps also apply if you've replaced a node using the same node name. Longhorn will recognize that the disks are different once the new node is up. You will need to remove the original disks first and add them back for the new node if it uses the same name as the previous node. + +## Removing a Node +To remove a node: +1. Disable the disk scheduling. +1. Evict all the replicas on the node. +1. Detach all the volumes on the node. + + If the node has been drained, all the workloads should be migrated to another node already. + + If there are any other volumes remaining attached, detach them before continuing. + +1. Remove the node from Longhorn using the `Delete` in the `Node` tab. + + Or, remove the node from Kubernetes, using: + + kubectl delete node + +1. Longhorn will automatically remove the node from the cluster. + +## Upgrading Kubernetes + +### In-place Upgrade +In-place upgrade is upgrading method in which nodes are upgraded without being removed from the cluster. +Some example solutions that use this upgrade methods are [k3s automated upgrades](https://docs.k3s.io/upgrades/automated), [Rancher's Kubernetes upgrade guide](https://rancher.com/docs/rancher/v2.x/en/cluster-admin/upgrading-kubernetes/#upgrading-the-kubernetes-version), +[Kubeadm upgrade](https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/), etc... + +With the assumption that node and disks are not being deleted/removed, the recommended upgrading guide is: +1. You should cordon and drain a node before upgrading Kubernetes components on a node. + Draining instruction is similar to the drain instruction at [Updating the Node OS or Container Runtime](#updating-the-node-os-or-container-runtime) +2. The drain `--timeout` should be big enough so that replica rebuildings on healthy node can finish. + The more Longhorn replicas you have on the draining node, the more time it takes for the Longhorn replicas to be rebuilt on other healthy nodes. + We recommending you to test and select a big enough value or set it to 0 (aka never timeout). +3. The number of nodes doing upgrade at a time should be smaller than the number of Longhorn replicas for each volume. + This is so that a running Longhorn volume has at least one healthy replica running at a time. +4. Set the setting [Node Drain Policy](../../references/settings#node-drain-policy) to `allow-if-replica-is-stopped` so that the drain is not blocked by the last healthy replica of a detached volume. + + +### Managed Kubernetes +See the instruction at [Support Managed Kubernetes Service](../../advanced-resources/support-managed-k8s-service) + + diff --git a/content/docs/1.5.1/volumes-and-nodes/multidisk.md b/content/docs/1.5.1/volumes-and-nodes/multidisk.md new file mode 100644 index 000000000..3cde7bc3c --- /dev/null +++ b/content/docs/1.5.1/volumes-and-nodes/multidisk.md @@ -0,0 +1,51 @@ +--- +title: Multiple Disk Support +weight: 5 +--- + +Longhorn supports using more than one disk on the nodes to store the volume data. + +By default, `/var/lib/longhorn` on the host will be used for storing the volume data. You can avoid using the default directory by adding a new disk, then disable scheduling for `/var/lib/longhorn`. + +## Add a Disk + +To add a new disk for a node, head to the `Node` tab, select one of the nodes, and select `Edit Disks` in the dropdown menu. + +To add any additional disks, you need to: +1. Mount the disk on the host to a certain directory. +2. Add the path of the mounted disk into the disk list of the node. + +Longhorn will detect the storage information (e.g. maximum space, available space) about the disk automatically, and start scheduling to it if it's possible to accommodate the volume. A path mounted by the existing disk won't be allowed. + +A certain amount of disk space can be reserved to stop Longhorn from using it. It can be set in the `Space Reserved` field for the disk. It's useful for the non-dedicated storage disk on the node. + +The kubelet needs to preserve node stability when available compute resources are low. This is especially important when dealing with incompressible compute resources, such as memory or disk space. If such resources are exhausted, nodes become unstable. To avoid kubelet `Disk pressure` issue after scheduling several volumes, by default, Longhorn reserved 30% of root disk space (`/var/lib/longhorn`) to ensure node stability. + +> **Note**: +> Since Longhorn uses filesystem ID to detect duplicate mounts of the same filesystem, you cannot add a disk that has the same filesystem ID as an existing disk on the same node. +> See more details at https://github.com/longhorn/longhorn/issues/2477 + +### Use an Alternative Path for a Disk on the Node + +If you don't want to use the original mount path of a disk on the node, you can use `mount --bind` to create an alternative/alias path for the disk, then use it with Longhorn. Notice that soft link `ln -s` won't work since it will not get populated correctly inside the pod. + +Longhorn will identify the disk using the path, so the users need to make sure the alternative path are correctly mounted when the node reboots, e.g. by adding it to `fstab`. + +## Remove a Disk +Nodes and disks can be excluded from future scheduling. Notice that any scheduled storage space won't be released automatically if the scheduling was disabled for the node. + +In order to remove a disk, two conditions need to be met: +- The scheduling for the disk must be disabled +- There is no existing replica using the disk, including any replicas in an error state. About how to evict replicas from disabled disks, refer to [Select Disks or Nodes for Eviction](../disks-or-nodes-eviction/#select-disks-or-nodes-for-eviction) + +Once those two conditions are met, you should be allowed to remove the disk. + +## Configuration +There are two global settings affect the scheduling of the volume. + +- `StorageOverProvisioningPercentage` defines the upper bound of `ScheduledStorage / (MaximumStorage - ReservedStorage)`. The default value is `100` (%). That means we can schedule a total of 150 GiB Longhorn volumes on a 200 GiB disk with 50G reserved for the root file system. Because normally people won't use that large amount of data in the volume, and we store the volumes as sparse files. +- `StorageMinimalAvailablePercentage` defines when a disk cannot be scheduled with more volumes. The default value is `10` (%). The bigger value between `MaximumStorage * StorageMinimalAvailablePercentage / 100` and `MaximumStorage - ReservedStorage` will be used to determine if a disk is running low and cannot be scheduled with more volumes. + +Notice that currently there is no guarantee that the space volumes use won't exceed the `StorageMinimalAvailablePercentage`, because: +1. Longhorn volumes can be bigger than the specified size, due to fact that the snapshot contains the old state of the volume. +2. Longhorn does over-provisioning by default. diff --git a/content/docs/1.5.1/volumes-and-nodes/node-space-usage.md b/content/docs/1.5.1/volumes-and-nodes/node-space-usage.md new file mode 100644 index 000000000..982a81842 --- /dev/null +++ b/content/docs/1.5.1/volumes-and-nodes/node-space-usage.md @@ -0,0 +1,35 @@ +--- +title: Node Space Usage +weight: 1 +--- + +In this section, you'll have a better understanding of the space usage info presented by the Longhorn UI. + + +### Whole Cluster Space Usage + +In `Dashboard` page, Longhorn will show you the cluster space usage info: + +{{< figure src="/img/screenshots/volumes-and-nodes/space-usage-info-dashboard-page.png" >}} + +`Schedulable`: The actual space that can be used for Longhorn volume scheduling. + +`Reserved`: The space reserved for other applications and system. + +`Used`: The actual space that has been used by Longhorn, system, and other applications. + +`Disabled`: The total space of the disks/nodes on which Longhorn volumes are not allowed for scheduling. + +### Space Usage of Each Node + +In `Node` page, Longhorn will show the space allocation, schedule, and usage info for each node: + +{{< figure src="/img/screenshots/volumes-and-nodes/space-usage-info-node-page.png" >}} + +`Size` column: The **max actual available space** that can be used by Longhorn volumes. It equals the total disk space of the node minus reserved space. + +`Allocated` column: The left number is the size that has been used for **volume scheduling**, and it does not mean the space has been used for the Longhorn volume data store. The right number is the **max** size for volume scheduling, which the result of `Size` multiplying `Storage Over Provisioning Percentage`. (In the above illustration, `Storage Over Provisioning Percentage` is 500.) Hence, the difference between the 2 numbers (let's call it as the allocable space) determines if a volume replica can be scheduled to this node. + +`Used` column: The left part indicates the currently used space of this node. The whole bar indicates the total space of the node. + +Notice that the allocable space may be greater than the actual available space of the node when setting `Storage Over Provisioning Percentage` to a value greater than 100. If the volumes are heavily used and lots of historical data will be stored in the volume snapshots, please be careful about using a large value for this setting. For more info about the setting, see [here](../../references/settings/#storage-over-provisioning-percentage) for details. diff --git a/content/docs/1.5.1/volumes-and-nodes/scheduling.md b/content/docs/1.5.1/volumes-and-nodes/scheduling.md new file mode 100644 index 000000000..dae01972f --- /dev/null +++ b/content/docs/1.5.1/volumes-and-nodes/scheduling.md @@ -0,0 +1,50 @@ +--- +title: Scheduling +--- + +In this section, you'll learn how Longhorn schedules replicas based on multiple factors. + +### Scheduling Policy + +Longhorn's scheduling policy has two stages. The scheduler only goes to the next stage if the previous stage is satisfied. Otherwise, the scheduling will fail. + +If any tag has been set in order to be selected for scheduling, the node tag and the disk tag have to match when the node or the disk is selected. + +The first stage is the **node and zone selection stage.** Longhorn will filter the node and zone based on the `Replica Node Level Soft Anti-Affinity` and `Replica Zone Level Soft Anti-Affinity` settings. + +The second stage is the **disk selection stage.** Longhorn will filter the disks that satisfy the first stage based on the `Storage Minimal Available Percentage`, `Storage Over Provisioning Percentage`, and other disk-related factors like requested disk space. + +#### The Node and Zone Selection Stage + +First, Longhorn will always try to schedule the new replica on a new node with a new zone if possible. In this context, "new" means that a replica for the volume has not already been scheduled to the zone or node, and "existing" refers to a node or zone that already has a replica scheduled to it. + +At this time, if both the `Replica Node Level Soft Anti-Affinity` and `Replica Zone Level Soft Anti-Affinity` settings are un-checked, and if there is no new node with a new zone, Longhorn will not schedule the replica. + +Then, Longhorn will look for a new node with an existing zone. If possible, it will schedule the new replica on the new node with an existing zone. + +At this time, if `Replica Node Level Soft Anti-Affinity` is un-checked and `Replica Zone Level Soft Anti-Affinity` is checked, and there is no new node with an existing zone, Longhorn will not schedule the replica. + +Last, Longhorn will look for an existing node with an existing zone to schedule the new replica. At this time both `Replica Node Level Soft Anti-Affinity` and `Replica Zone Level Soft Anti-Affinity` should be checked. + +#### Disk Selection Stage + +Once the node and zone stage is satisfied, Longhorn will decide if it can schedule the replica on the disk of the node. Longhorn will check the available disks on the selected node with the matching tag, the total disk space, and the available disk space. + +For example, after the node and zone stage, Longhorn finds `Node A` satisfies the requirements for scheduling a replica to the node. Longhorn will check all the available disks on this node. + +Assume this node has two disks: `Disk X` with available space 1 GB, and `Disk Y` with available space 2 GB. And the replica Longhorn going to schedule needs 1 GB. With default `Storage Minimal Available Percentage` 25, Longhorn can only schedule the replica on `Disk Y` if this `Disk Y` matches the disk tag, otherwise Longhorn will return failure on this replica selection. But if the `Storage Minimal Available Percentage` is set to 0, and `Disk X` also matches the disk tag, Longhorn can schedule the replica on `Disk X`. + +### Settings + +For more information on settings that are relevant to scheduling replicas on nodes and disks, refer to the settings reference: + +- [Disable Scheduling On Cordoned Node](../../references/settings/#disable-scheduling-on-cordoned-node) +- [Replica Soft Anti-Affinity](../../references/settings/#replica-node-level-soft-anti-affinity) (also called Replica Node Level Soft Anti-Affinity) +- [Replica Zone Level Soft Anti-Affinity](../../references/settings/#replica-zone-level-soft-anti-affinity) +- [Storage Minimal Available Percentage](../../references/settings/#storage-minimal-available-percentage) +- [Storage Over Provisioning Percentage](../../references/settings/#storage-over-provisioning-percentage) + +### Notice +Longhorn relies on label `topology.kubernetes.io/zone=` or `topology.kubernetes.io/region=` in the Kubernetes node object to identify the zone/region. + +Since these are reserved and used by Kubernetes as [well-known labels](https://kubernetes.io/docs/reference/labels-annotations-taints/#topologykubernetesiozone). diff --git a/content/docs/1.5.1/volumes-and-nodes/storage-tags.md b/content/docs/1.5.1/volumes-and-nodes/storage-tags.md new file mode 100644 index 000000000..048460d96 --- /dev/null +++ b/content/docs/1.5.1/volumes-and-nodes/storage-tags.md @@ -0,0 +1,75 @@ +--- +title: Storage Tags +weight: 3 +--- + +## Overview + +The storage tag feature enables only certain nodes or disks to be used for storing Longhorn volume data. For example, performance-sensitive data can use only the high-performance disks which can be tagged as `fast`, `ssd` or `nvme`, or only the high-performance node tagged as `baremetal`. + +This feature supports both disks and nodes. + +## Setup + +The tags can be set up using the Longhorn UI: + +1. *Node -> Select one node -> Edit Node and Disks* +2. Click `+New Node Tag` or `+New Disk Tag` to add new tags. + +All the existing scheduled replica on the node or disk won't be affected by the new tags. + +## Usage + +When multiple tags are specified for a volume, the disk and the node (the disk belong to) must have all the specified tags to become usable. + +### UI + +When creating a volume, specify the disk tag and node tag in the UI. + +### Kubernetes + +Use Kubernetes StorageClass parameters to specify tags. + +You can specify tags in the default Longhorn StorageClass by adding parameter `nodeSelector: "storage,fast"` in the ConfigMap named `longhorn-storageclass`. +For example: + +```yaml +apiVersion: v1 +kind: ConfigMap +data: + storageclass.yaml: | + kind: StorageClass + apiVersion: storage.k8s.io/v1 + metadata: + name: longhorn + annotations: + storageclass.kubernetes.io/is-default-class: "true" + provisioner: driver.longhorn.io + allowVolumeExpansion: true + reclaimPolicy: "Delete" + volumeBindingMode: Immediate + parameters: + numberOfReplicas: "3" + staleReplicaTimeout: "480" + diskSelector: "ssd" + nodeSelector: "storage,fast" +``` +If Longhorn is installed via Helm, you can achieve that by editing `persistence.defaultNodeSelector` in [values.yaml](https://github.com/longhorn/longhorn/blob/v{{< current-version >}}/chart/values.yaml). + +Alternatively, a custom storageClass setting can be used, e.g.: +```yaml +kind: StorageClass +apiVersion: storage.k8s.io/v1 +metadata: + name: longhorn-fast +provisioner: driver.longhorn.io +parameters: + numberOfReplicas: "3" + staleReplicaTimeout: "480" # 8 hours in minutes + diskSelector: "ssd" + nodeSelector: "storage,fast" +``` + +## History +* [Original feature request](https://github.com/longhorn/longhorn/issues/311) +* Available since v0.6.0 diff --git a/content/docs/1.5.1/volumes-and-nodes/trim-filesystem.md b/content/docs/1.5.1/volumes-and-nodes/trim-filesystem.md new file mode 100644 index 000000000..a5773cedd --- /dev/null +++ b/content/docs/1.5.1/volumes-and-nodes/trim-filesystem.md @@ -0,0 +1,78 @@ +--- +title: Trim Filesystem +weight: 4 +--- + +Since v1.4.0, Longhorn supports trimming filesystem inside Longhorn volumes. Trimming will reclaim space wasted by the removed files of the filesystem. + +> **Note:** +> - Trying to trim a removed files from a valid snapshot will do nothing but the filesystem will discard this kind of in-memory trimmable file info. Later on if you mark the snapshot as removed and want to retry the trim, you may need to unmount and remount the filesystem so that the filesystem can recollect the trimmable file info. +> +> - If you allow automatically removing snapshots during filesystem trim, please be careful of using mount option `discard`, which will trigger the snapshot removal frequently then interrupt some operations like backup creation. + +## Prerequisites + +- The Longhorn version must be v1.4.0 or higher. +- There is a trimmable filesystem like EXT4 or XFS inside the Longhorn volume. +- The volume is attached and mounted on a mount point before trimming. + +## Trim the filesystem in a Longhorn volume + +There are two ways to do trim for a Longhorn volume: with the Longhorn UI and directly via cmd. + +#### Via Longhorn UI + +You can directly click volume operation `Trim Filesystem` for attached volumes. + +Then Longhorn will **try its best** to figure out the mount point and execute `fstrim `. If something is wrong or the filesystem does not exist, the UI will return an error. + +#### Via cmd + +Users need to figure out the mount point of the volume then manually execute `fstrim `. + +## Automatically Remove Snapshots During Filesystem Trim + +By design each valid snapshot of a Longhorn volume is immutable. Hence Longhorn filesystem trim feature can be applied to **the volume head and the followed continuous removed or system snapshots only**. + +#### The Global Setting "Remove Snapshots During Filesystem Trim" + +To help reclaim as much space as possible automatically, Longhorn introduces [setting _Remove Snapshots During Filesystem Trim_](../../references/settings/#remove-snapshots-during-filesystem-trim). This allows Longhorn filesystem trim feature to automatically mark the latest snapshot and its ancestors as removed and stops at the snapshot containing multiple children. As a result, Longhorn can reclaim space for as more snapshots as possible. + +#### The Volume Spec Field "UnmapMarkSnapChainAsRemoved" + +Of course there is a per-volume field `volume.Spec.UnmapMarkSnapChainAsRemoved` would overwrite the global setting mentioned above. + +There are 3 options for this volume field: `ignored`, `enabled`, and `disabled`. `ignored` means following the global setting, which is the default value. + +You can directly set this field in the StoragaClasses so that the customized value can be applied to all volumes created by the StorageClasses. + +## Known Issues & Limitations + +### RWX volumes +- Currently, Longhorn **UI** only supports filesystem trimming for RWO volume. It will be enhanced for RWX volume at https://github.com/longhorn/longhorn/issues/5143. + +- If you want to trim a RWX volume manually, you can: + 1. Figure out and enter into the share manager pod of the RWX volume, which actually contains the NFS server. The share manager pod is typically named as `share-manager-`. + ```shell + kubectl -n longhorn-system exec -it -- bash + ``` + 2. Figure out the work directory of the NFS server. The work directory is typically like `/export/`: + ```shell + mount | grep /dev/longhorn/ + /dev/longhorn/ on /export/ type ext4 (rw,relatime) + ``` + 3. Trim the work directory + ```shell + fstrim /export/ + ``` + +### Encrypted volumes +- By default, TRIM commands are not enabled by the device-mapper. You can check [this doc](https://wiki.archlinux.org/title/Dm-crypt/Specialties#Discard/TRIM_support_for_solid_state_drives_(SSD)) for details. + +- If you still want to trim an encrypted Longhorn volume, you can: + 1. Enter into the node host the volume is attached to. + 2. Enable flag `discards` for the encrypted volume. The passphrase is recorded in the corresponding secert: + ```shell + cryptsetup --allow-discards --persistent refresh + ``` + 3. Directly use Longhorn UI to trim the volume or execute `fstrim` for **the mount point** of `/dev/mapper/` manually. diff --git a/content/docs/1.5.1/volumes-and-nodes/volume-size.md b/content/docs/1.5.1/volumes-and-nodes/volume-size.md new file mode 100644 index 000000000..7402750a0 --- /dev/null +++ b/content/docs/1.5.1/volumes-and-nodes/volume-size.md @@ -0,0 +1,163 @@ +--- +title: Volume Size +weight: 1 +--- + +In this section, you'll have a better understanding of concepts related to volume size. + +## Volume `Size`: +{{< figure src="/img/screenshots/volumes-and-nodes/volume-size-nominal-size.png" >}} +- It is what you set during the volume creation, and we will call it nominal size in this doc to avoid ambiguity. +- Since the volume itself is just a CRD object in Kubernetes and the data is stored in each replica, this is actually the nominal size of each replica. +- The reason we call this field as "nominal size" is that Longhorn replicas are using [sparse files](https://wiki.archlinux.org/index.php/Sparse_file) to store data and this value is the apparent size of the sparse files (the maximum size to which they may expand). The actual size used by each replica is not equal to this nominal size. +- Based on this nominal size, the replicas will be scheduled to those nodes that have enough allocatable space during the volume creation. (See [this doc](../node-space-usage) for more info about node allocation size.) +- The value of nominal size determines the max available space when the volume is in use. In other words, the current active data size hold by a volume cannot be greater than its nominal size. + +## Volume `Actual Size` +{{< figure src="/img/screenshots/volumes-and-nodes/volume-size-actual-size.png" >}} +- The actual size indicates the actual space used by **each** replica on the corresponding node. +- Since all historical data stored in the snapshots and active data will be calculated into the actual size, the final value can be greater than the nominal size. +- The actual size will be shown only when the volume is running. + +## Example + +In the example, we will explain how volume `size` and `actual size` get changed after a bunch of IO and snapshot related operations. + +> The illustration presents the file organization of **one replica**. The volume head and snapshots are actually sparse files, which we mentioned above. + +{{< figure src="/img/screenshots/volumes-and-nodes/volume-size-illustration.png" >}} + + +1. Create a 12 Gi volume with a single replica, then attach and mount it on a node. See Figure 1 of the illustration. + - For the empty volume, the nominal `size` is 12 Gi and the `actual size` is almost 0. + - There is some meta info in the volume hence the `actual size` is 260 Mi and is not exactly 0. + +{{< figure src="/img/screenshots/volumes-and-nodes/volume-size-illustration-fig1.png" >}} + +2. Write 4 Gi data (data#0) in the volume mount point. The `actual size` is increased by 4 Gi because of the allocated blocks in the replica for the 4 Gi data. Meanwhile, `df` command in the filesystem also shows the 4 Gi used space. See Figure 2 of the illustration. + +{{< figure src="/img/screenshots/volumes-and-nodes/volume-size-illustration-fig2.png" >}} + +3. Delete the 4 Gi data. Then, `df` command shows that the used space of the filesystem is nearly 0, but the `actual size` is unchanged. + + > Users can see by default the volume `actual size` is not shrunk after deleting the 4 Gi data. Longhorn is a block-level storage system. Therefore, the deletion in the filesystem only marks the blocks that belong to the deleted file as unused. Currently, Longhorn will not apply TRIM/UNMAP operations automatically/periodically. if you want to do filesystem trim, please check [this doc](../trim-filesystem) for details. + +{{< figure src="/img/screenshots/volumes-and-nodes/volume-size-illustration-fig2.png" >}} + +4. Then, rewrite the 4 Gi data (data#1), and the `df` command in the filesystem shows 4 Gi used space again. However, the `actual size` is increased by 4 Gi and becomes 8.25Gi. See Figure 3(a) of the illustration. + + > After deletion, filesystem may or maynot reuse the recently freed blocks from recently deleted files according to the filesystem design and please refer to [Block allocation strategies of various filesystems](https://www.ogris.de/blkalloc). If the volume nominal `size` is 12 Gi, the `actual size` in the end would range from 4 Gi to 8 Gi since the filesystem may or maynot reuse the freed blocks. On the other hand, if the volume nominal `size` is 6 Gi, the `actual size` at the end would range from 4 Gi to 6 Gi, because the filesystem has to reuse the freed blocks in the 2nd round of writing. See Figure 3(b) of the illustration. + > + > Thus, allocating an appropriate nominal `size` for a volume that holds heavy writing tasks according to the IO pattern would make disk space usage more efficient. + +{{< figure src="/img/screenshots/volumes-and-nodes/volume-size-illustration-fig3.png" >}} + +5. Take a snapshot (snapshot#1). See Figure 4 of the illustration. + - Now data#1 is stored in snapshot#1. + - The new volume head size is almost 0. + - With the volume head and the snapshot included, the `actual size` remains 8.25 Gi. + +{{< figure src="/img/screenshots/volumes-and-nodes/volume-size-illustration-fig4.png" >}} + +6. Delete data#1 from the mount point. + - The data#1 filesystem level removal info is stored in current volume head file. For snapshot#1, data#1 is still retained as the historical data. + - The `actual size` is still 8.25 Gi. + +7. Write 8 Gi data (data#2) in the volume mount, then take one more snapshot (snapshot#2). See Figure 5 of the illustration. + - Now the `actual size` is 16.2 Gi, which is greater than the volume nominal `size`. + - From a filesystem's perspective, the overlapping part between the two snapshots is considered as the blocks that have to be reused or overwritten. But in terms of Longhorn, these blocks are actually fresh ones held in another snapshot/volume head. See the 2 snapshots in Figure 6. + + > The volume head holds the latest data of the volume only, while each snapshot may store historical data as well as active data, which consumes at most size space. Therefore, the volume `actual size`, which is the size sum of the volume head and all snapshots, is possibly bigger than the size specified by users. + > + > Even if users will not take snapshots for volumes, there are operations like rebuilding, expansion, or backing up that would lead to system (hidden) snapshot creation. As a result, volume `actual size` being larger than size is unavoidable under some use cases. + +{{< figure src="/img/screenshots/volumes-and-nodes/volume-size-illustration-fig5.png" >}} + +8. Delete snapshot#1 and wait for snapshot purge complete. See Figure 7 of the illustration. + - Here Longhorn actually coalesces the snapshot#1 with the snapshot#2. + - For the overlapping part during coalescing, the newer data (data#2) will be retained in the blocks. Then some historical data is removed and the volume gets shrunk (from 16.2 Gi to 11.4 Gi in the example). + +{{< figure src="/img/screenshots/volumes-and-nodes/volume-size-illustration-fig6.png" >}} + +9. Delete all existing data (data#2) and write 11.5 Gi data (data#3) in the volume mount. See Figure 8 of the illustration. + - this makes the volume head actual size becomes 11.5 Gi and the volume total actual size becomes 22.9 Gi. + +{{< figure src="/img/screenshots/volumes-and-nodes/volume-size-illustration-fig7.png" >}} + +10. Try to delete the only snapshot (snapshot#2) of the volume. See Figure 9 of the illustration. + - The snapshot directly behinds the volume head cannot be cleaned up. + If users try to delete this kind of snapshot, Longhorn will mark the snapshot as Removing, hide it, then try to free the overlapping part between the volume head and the snapshot for the snapshot file. + The last operation is called snapshot prune in Longhorn and is available since v1.3.0. + - Since in the example both the snapshot and the volume head use up most of the nominal space, the overlapping part almost equals to the snapshot actual size. After the pruning, the snapshot actual size is down to 259 Mi and the volume gets shrunk from 22.9 Gi to 11.8 Gi. + +{{< figure src="/img/screenshots/volumes-and-nodes/volume-size-illustration-fig8.png" >}} + + +Here we summarize the important things related to disk space usage we have in the example: + +- Unused blocks are not released + + Longhorn will not issue TRIM/UNMAP operations automatically. Hence deleting files from filesystems will not lead to volume actual size decreasing/shrinking. You may need to check [the doc](../trim-filesystem) and handle it by yourself if needed. + + +- Allocated blocks but unused are not reused + + Deleting then writing new files would lead to the actual size keeps increasing. Since the filesystem may not reuse the recently freed blocks from recently deleted files. Thus, allocating an appropriate nominal size for a volume that holds heavy writing tasks according to the IO pattern would make disk space usage more efficient. + +- By deleting snapshots, the overlapping part of the used blocks might be eliminated regardless of whether the blocks are recently released blocks by the filesystem or still contain historical data. + +## Space Configuration Suggestions for Volumes + +1. Reserve enough free space in disks as buffers in case of the actual size of existing volumes keep growing up. + - A general estimation for the maximum space consumption of a volume is + + ``` + (N + 1) x head/snapshot average actual size + ``` + + - where `N` is the total number of snapshots the volume contains (including the volume head), and the extra `1` is for the temporary space that may be required by snapshot deletion. + - The average actual size of the snapshots varies and depends on the use cases. + If snapshots are created periodically for a volume (e.g. by relying on snapshot recurring jobs), the average value would be the average modified data size for the volume in the snapshot creation interval. + If there are heavy writing tasks for volumes, the head/snapshot average actual size would be volume the nominal size. In this case, it's better to set [`Storage Over Provisioning Percentage`](../../references/settings/#storage-over-provisioning-percentage) to be smaller than 100% to avoid disk space exhaustion. + - Some extended cases: + - There is one snapshot recurring job with retention number is `N`. Then the formula can be extended to: + + ``` + (M + N + 1 + 1 + 1 + 1) x head/snapshot average actual size + ``` + + - The explanation of the formula: + - `M` is the snapshots created by users manually. Recurring jobs are not responsible for removing this kind of snapshot. They can be deleted by users only. + - `N` is the snapshot recurring job retain number. + - The 1st `1` means the volume head. + - The 2nd `1` means the extra snapshot created by the recurring job. Since the recurring job always creates a new snapshot then deletes the oldest snapshot when the current snapshots created by itself exceeds the retention number. Before the deletion starts, there is one extra snapshot that can take extra disk space. + - The 3rd `1` is the system snapshot. If the rebuilding is triggered or the expansion is issued, Longhorn will create a system snapshot before starting the operations. And this system snapshot may not be able to get cleaned up immediately. + - The 4th `1` is for the temporary space that may be required by snapshot deletion/purge. + - Users don't want snapshot at all. Neither (manually created) snapshot nor recurring job will be launched. Assume [setting _Automatically Cleanup System Generated Snapshot_](../../references/settings/#automatically-cleanup-system-generated-snapshot) is enabled, then formula would become: + + ``` + (1 + 1 + 1) x head/snapshot average actual size + ``` + + - The worst case that leads to so much space usage: + 1. At some point the 1st rebuilding/expansion is triggered, which leads to the 1st system snapshot creation. + - The purges before and after the 1st rebuilding does nothing. + 2. There is data written to the new volume head, and the 2nd rebuilding/expansion somehow is triggered. + - The snapshot purge before the 2nd rebuilding may lead to the shrink of the 1st system snapshot. + - Then the 2nd system snapshot is created and the rebuilding is started. + - After the rebuilding done, the subsequent snapshot purge would lead to the coalescing of the 2 system snapshots. This coalescing requires temporary space. + 3. During the afterward snapshot purging for the 2nd rebuilding, there is more data written to the new volume head. + - The explanation of the formula: + - The 1st `1` means the volume head. + - The 2nd `1` is the second system snapshot mentioned in the worst case. + - The 3rd `1` is for the temporary space that may be required by the 2 system snapshot purge/coalescing. + +2. Do not retain too many snapshots for the volumes. + +3. Cleaning up snapshots will help reclaim disk space. There are two ways to clean up snapshots: + - Delete the snapshots manually via Longhorn UI. + - Set a snapshot recurring job with retention 1, then the snapshots will be cleaned up automatically. + + Also, notice that the extra space, up to volume nominal `size`, is required during snapshot cleanup and merge. + +4. An appropriate the volume nominal `size` according to the workloads. diff --git a/content/docs/1.5.1/volumes-and-nodes/workload-identification.md b/content/docs/1.5.1/volumes-and-nodes/workload-identification.md new file mode 100644 index 000000000..c5f42877f --- /dev/null +++ b/content/docs/1.5.1/volumes-and-nodes/workload-identification.md @@ -0,0 +1,48 @@ +--- +title: Viewing Workloads that Use a Volume +weight: 2 +--- + +Now users can identify current workloads or workload history for existing Longhorn persistent volumes (PVs) and their history of being bound to persistent volume claims (PVCs). + +From the Longhorn UI, go to the **Volume** tab. Each Longhorn volume is listed on the page. The **Attached To** column displays the name of the workload using the volume. If you click the workload name, you will be able to see more details, including the workload type, pod name, and status. + +Workload information is also available on the Longhorn volume detail page. To see the details, click the volume name: + +``` +State: attached +... +Namespace:default +PVC Name:longhorn-volv-pvc +PV Name:pvc-0edf00f3-1d67-4783-bbce-27d4458f6db7 +PV Status:Bound +Pod Name:teststatefulset-0 +Pod Status:Running +Workload Name:teststatefulset +Workload Type:StatefulSet +``` + +## History + +After the workload is no longer using the Longhorn volume, the volume detail page shows the historical status of the most recent workload that used the volume: + +``` +Last time used by Pod: a few seconds ago +... +Last Pod Name: teststatefulset-0 +Last Workload Name: teststatefulset +Last Workload Type: Statefulset +``` + +If these fields are set, they indicate that currently no workload is using this volume. + +When a PVC is no longer bound to the volume, the following status is shown: + +``` +Last time bound with PVC:a few seconds ago +Last time used by Pod:32 minutes ago +Last Namespace:default +Last Bounded PVC Name:longhorn-volv-pvc +``` + +If the `Last time bound with PVC` field is set, it indicates currently there is no bound PVC for this volume. The related fields will show the most recent workload using this volume. diff --git a/content/docs/1.5.1/what-is-longhorn.md b/content/docs/1.5.1/what-is-longhorn.md new file mode 100644 index 000000000..2ecfb8232 --- /dev/null +++ b/content/docs/1.5.1/what-is-longhorn.md @@ -0,0 +1,46 @@ +--- +title: What is Longhorn? +weight: 1 +--- +Longhorn is a lightweight, reliable and easy-to-use distributed block storage system for Kubernetes. + +Longhorn is free, open source software. Originally developed by Rancher Labs, it is now being developed as a incubating project of the Cloud Native Computing Foundation. + +With Longhorn, you can: + +- Use Longhorn volumes as persistent storage for the distributed stateful applications in your Kubernetes cluster +- Partition your block storage into Longhorn volumes so that you can use Kubernetes volumes with or without a cloud provider +- Replicate block storage across multiple nodes and data centers to increase availability +- Store backup data in external storage such as NFS or AWS S3 +- Create cross-cluster disaster recovery volumes so that data from a primary Kubernetes cluster can be quickly recovered from backup in a second Kubernetes cluster +- Schedule recurring snapshots of a volume, and schedule recurring backups to NFS or S3-compatible secondary storage +- Restore volumes from backup +- Upgrade Longhorn without disrupting persistent volumes + +Longhorn comes with a standalone UI, and can be installed using Helm, kubectl, or the Rancher app catalog. + +### Simplifying Distributed Block Storage with Microservices + +Because modern cloud environments require tens of thousands to millions of distributed block storage volumes, some storage controllers have become highly complex distributed systems. By contrast, Longhorn can simplify the storage system by partitioning a large block storage controller into a number of smaller storage controllers, as long as those volumes can still be built from a common pool of disks. By using one storage controller per volume, Longhorn turns each volume into a microservice. The controller is called the Longhorn Engine. + +The Longhorn Manager component orchestrates the Longhorn Engines, so they work together coherently. + +### Use Persistent Storage in Kubernetes without Relying on a Cloud Provider + +Pods can reference storage directly, but this is not recommended because it doesn't allow the Pod or container to be portable. Instead, the workloads' storage requirements should be defined in Kubernetes Persistent Volumes (PVs) and Persistent Volume Claims (PVCs). With Longhorn, you can specify the size of the volume, the number of synchronous replicas and other volume specific configurations you want across the hosts that supply the storage resource for the volume. Then your Kubernetes resources can use the PVC and corresponding PV for each Longhorn volume, or use a Longhorn storage class to automatically create a PV for a workload. + +Replicas are thin-provisioned on the underlying disks or network storage. + +### Schedule Multiple Replicas across Multiple Compute or Storage Hosts + +To increase availability, Longhorn creates replicas of each volume. Replicas contain a chain of snapshots of the volume, with each snapshot storing the change from a previous snapshot. Each replica of a volume also runs in a container, so a volume with three replicas results in four containers. + +The number of replicas for each volume is configurable in Longhorn, as well as the nodes where replicas will be scheduled. Longhorn monitors the health of each replica and performs repairs, rebuilding the replica when necessary. + +### Assign Multiple Storage Frontends for Each Volume + +Common front-ends include a Linux kernel device (mapped under /dev/longhorn) and an iSCSI target. + +### Specify Schedules for Recurring Snapshot and Backup Operations + +Specify the frequency of these operations (hourly, daily, weekly, monthly, and yearly), the exact time at which these operations are performed (e.g., 3:00am every Sunday), and how many recurring snapshots and backup sets are kept.