Skip to content

Commit

Permalink
docs: add 1.5.1 draft doc
Browse files Browse the repository at this point in the history
Signed-off-by: David Ko <dko@suse.com>
  • Loading branch information
innobead committed Jun 28, 2023
1 parent bd304d0 commit 026effa
Show file tree
Hide file tree
Showing 117 changed files with 10,361 additions and 1 deletion.
2 changes: 1 addition & 1 deletion config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ style = "paraiso-dark"
[params]
description = "Cloud native distributed block storage for Kubernetes"
tagline = "Easy to use, 100% open source, run anywhere"
versions = ["1.4.2", "1.4.1", "1.4.0", "1.3.3", "1.3.2", "1.3.1", "1.3.0", "1.2.6", "1.1.3", "1.6.0-dev", "1.5.0-dev", "1.4.3-dev", "1.3.4-dev"]
versions = ["1.5.0", "1.4.2", "1.4.1", "1.4.0", "1.3.3", "1.3.2", "1.3.1", "1.3.0", "1.2.6", "1.1.3", "1.6.0-dev", "1.5.1-dev", "1.4.3-dev", "1.3.4-dev"]
archived_versions = ["1.2.5", "1.2.4", "1.2.3", "1.2.2", "1.2.1", "1.2.0", "1.1.2", "1.1.1", "1.1.0", "0.8.0", "0.8.1"]
alpine_js_version = "2.2.5"
locale = "en_US"
Expand Down
18 changes: 18 additions & 0 deletions content/docs/1.5.1/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
title: The Longhorn Documentation
description: Cloud native distributed block storage for Kubernetes
weight: 1
---

**Longhorn** is a lightweight, reliable, and powerful distributed [block storage](https://cloudacademy.com/blog/object-storage-block-storage/) system for Kubernetes.

Longhorn implements distributed block storage using containers and microservices. Longhorn creates a dedicated storage controller for each block device volume and synchronously replicates the volume across multiple replicas stored on multiple nodes. The storage controller and replicas are themselves orchestrated using Kubernetes.

## Features

* Enterprise-grade distributed block storage with no single point of failure
* Incremental snapshot of block storage
* Backup to secondary storage ([NFS](https://www.extrahop.com/resources/protocols/nfs/) or [S3](https://aws.amazon.com/s3/)-compatible object storage) built on efficient change block detection
* Recurring snapshots and backups
* Automated, non-disruptive upgrades. You can upgrade the entire Longhorn software stack without disrupting running storage volumes.
* An intuitive GUI dashboard
4 changes: 4 additions & 0 deletions content/docs/1.5.1/advanced-resources/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
title: Advanced Resources
weight: 70
---
163 changes: 163 additions & 0 deletions content/docs/1.5.1/advanced-resources/backing-image.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
---
title: Backing Image
weight: 4
---

Longhorn natively supports backing images since v1.1.1.

A QCOW2 or RAW image can be set as the backing/base image of a Longhorn volume, which allows Longhorn to be integrated with a VM like [Harvester](https://github.com/rancher/harvester).

## Create Backing Image

### Parameters during creation

#### The data source of a backing image
There are 3 kinds of data sources. Or in other words, there are 3 ways to prepare a backing image file data:
1. Download a file from a URL as a backing image.
2. Upload a local file as a backing image. This is almost exclusive for UI.
3. Export an existing in-cluster volume as a backing image.

#### The checksum of a backing image
- The checksum of a backing image is **the SHA512 checksum** of the whole backing image **file** rather than that of the actual content.
What's the difference? When Longhorn calculates the checksum of a qcow2 file, it will read the file as a raw file instead of using the qcow library to read the correct content. In other words, users always get the correct checksum by executing `shasum -a 512 <the file path>` regardless of the file format.
- It's recommended to provide the expected checksum during backing image creation.
Otherwise, Longhorn will consider the checksum of the first file as the correct one. Once there is something wrong with the first file preparation, which then leads to an incorrect checksum as the expected value, this backing image is probably unavailable.

### The way of creating a backing image

#### Create a backing image via Longhorn UI
On **Setting > Backing Image** page, users can create backing images with any kinds of data source.

#### Create a backing image via YAML
You can download a file or export an existing volume as a backing image via YAML.
It's better not to "upload" a file via YAML. Otherwise, you need to manually handle the data upload via HTTP requests.

Here are some examples:
```yaml
apiVersion: longhorn.io/v1beta2
kind: BackingImage
metadata:
name: bi-download
namespace: longhorn-system
spec:
sourceType: download
sourceParameters:
url: https://longhorn-backing-image.s3-us-west-1.amazonaws.com/parrot.raw
checksum: 304f3ed30ca6878e9056ee6f1b02b328239f0d0c2c1272840998212f9734b196371560b3b939037e4f4c2884ce457c2cbc9f0621f4f5d1ca983983c8cdf8cd9a
```
```yaml
apiVersion: longhorn.io/v1beta2
kind: BackingImage
metadata:
name: bi-export
namespace: longhorn-system
spec:
sourceType: export-from-volume
sourceParameters:
volume-name: vol-export-src
export-type: qcow2
```
#### Create and use a backing image via StorageClass and PVC
1. In a Longhorn StorageClass.
2. Setting parameter `backingImageName` means asking Longhorn to use this backing image during volume creation.
3. If you want to create the backing image as long as it does not exist during the CSI volume creation, parameters `backingImageDataSourceType` and `backingImageDataSourceParameters` should be set as well. Similar to YAML, it's better not to create a backing image via "upload" in StorageClass. Note that if all of these parameters are set and the backing image already exists, Longhorn will validate if the parameters matches the existing one before using it.
- For `download`:
```yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: longhorn-backing-image-example
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
numberOfReplicas: "3"
staleReplicaTimeout: "2880"
backingImage: "bi-download"
backingImageDataSourceType: "download"
backingImageDataSourceParameters: '{"url": "https://backing-image-example.s3-region.amazonaws.com/test-backing-image"}'
backingImageChecksum: "SHA512 checksum of the backing image"
```
- For `export-from-volume`:
```yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: longhorn-backing-image-example
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
numberOfReplicas: "3"
staleReplicaTimeout: "2880"
backingImage: "bi-export-from-volume"
backingImageDataSourceType: "export-from-volume"
backingImageDataSourceParameters: '{"volume-name": "vol-export-src", "export-type": "qcow2"}'
```

4. Create a PVC with the StorageClass. Then the backing image will be created (with the Longhorn volume) if it does not exist.
5. Longhorn starts to prepare the backing images to disks for the replicas when a volume using the backing image is attached to a node.

#### Notice:
- Please be careful of the escape character `\` when you input a download URL in a StorageClass.

## Utilize a backing image in a volume

Users can [directly create then immediately use a backing image via StorageClass](./#create-and-use-a-backing-image-via-storageclass-and-pvc),
or utilize an existing backing image as mentioned below.

#### Use an existing backing
##### Use an existing backing Image during volume creation
1. Click **Setting > Backing Image** in the Longhorn UI.
2. Click **Create Backing Image** to create a backing image with a unique name and a valid URL.
3. During the volume creation, specify the backing image from the backing image list.
4. Longhorn starts to download the backing image to disks for the replicas when a volume using the backing image is attached to a node.

##### Use an existing backing Image during volume restore
1. Click `Backup` and pick up a backup volume for the restore.
2. As long as the backing image is already set for the backup volume, Longhorn will automatically choose the backing image during the restore.
3. Longhorn allows you to re-specify/override the backing image during the restore.

#### Download the backing image file to the local machine
Since v1.3.0, users can download existing backing image files to the local via UI.

#### Notice:
- Users need to make sure the backing image existence when they use UI to create or restore a volume with a backing image specified.
- Before downloading an existing backing image file to the local, users need to guarantee there is a ready file for it.

## Clean up backing images

#### Clean up backing images in disks
- Longhorn automatically cleans up the unused backing image files in the disks based on [the setting `Backing Image Cleanup Wait Interval`](../../references/settings#backing-image-cleanup-wait-interval). But Longhorn will retain at least one file in a disk for each backing image anyway.
- The unused backing images can be also cleaned up manually via the Longhorn UI: Click **Setting > Backing Image > Operation list of one backing image > Clean Up**. Then choose disks.
- Once there is one replica in a disk using a backing image, no matter what the replica's current state is, the backing image file in this disk cannot be cleaned up.

#### Delete backing images
- The backing image can be deleted only when there is no volume using it.

## Backing image recovery
- If there is still a ready backing image file in one disk, Longhorn will automatically clean up the failed backing image files then re-launch these files from the ready one.
- If somehow all files of a backing image become failed, and the first file is :
- downloaded from a URL, Longhorn will restart the downloading.
- exported from an existing volume, Longhorn will (attach the volume if necessary then) restart the export.
- uploaded from user local env, there is no way to recover it. Users need to delete this backing image then re-create a new one by re-uploading the file.
- When a node is down or the backing image manager pod on the node is unavailable, all backing image files on the node will become `unknown`. Later on if the node is back and the pod is running, Longhorn will detect then reuse the existing files automatically.

## Backing image Workflow
1. To manage all backing image files in a disk, Longhorn will create one backing image manager pod for each disk. Once the disk has no backing image file requirement, the backing image manager will be removed automatically.
2. Once a backing image file is prepared by the backing image manager for a disk, the file will be shared among all volume replicas in this disk.
3. When a backing image is created, Longhorn will launch a backing image data source pod to prepare the first file. The file data is from the data source users specified (download from remote/upload from local/export from the volume). After the preparation done, the backing image manager pod in the same disk will take over the file then Longhorn will stop the backing image data source pod.
4. Once a new backing image is used by a volume, the backing image manager pods in the disks that the volume replicas reside on will be asked to sync the file from the backing image manager pods that already contain the file.
5. As mentioned in the section [#clean-up-backing-images-in-disks](#clean-up-backing-images-in-disks), the file will be cleaned up automatically if all replicas in one disk do not use one backing image file.

## Warning
- The download URL of the backing image should be public. We will improve this part in the future.
- If there is high memory usage of one backing image manager pod after [file download](#download-the-backing-image-file-to-the-local-machine), this is caused by the system cache/buffer. The memory usage will decrease automatically hence you don't need to worry about it. See [the GitHub ticket](https://github.com/longhorn/longhorn/issues/4055) for more details.

## History
* Available since v1.1.1 [Enable backing image feature in Longhorn](https://github.com/Longhorn/Longhorn/issues/2006)
* Support [upload]((https://github.com/longhorn/longhorn/issues/2404) and [volume exporting](https://github.com/longhorn/longhorn/issues/2403) since v1.2.0.
* Support [download to local]((https://github.com/longhorn/longhorn/issues/2404) and [volume exporting](https://github.com/longhorn/longhorn/issues/3155) since v1.3.0.
4 changes: 4 additions & 0 deletions content/docs/1.5.1/advanced-resources/data-recovery/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
title: Data Recovery
weight: 4
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---
title: Identifying Corrupted Replicas
weight: 3
---

In the case that one of the disks used by Longhorn went bad, you might experience intermittent input/output errors when using a Longhorn volume.

For example, one file sometimes cannot be read, but later it can. In this scenario, it's likely one of the disks went bad, resulting in one of the replicas returning incorrect data to the user.

To recover the volume, we can identify the corrupted replica and remove it from the volume:

1. Scale down the workload to detach the volume.
2. Find all the replicas' locations by checking the Longhorn UI. The directories used by the replicas will be shown as a tooltip for each replica in the UI.
3. Log in to each node that contains a replica of the volume and get to the directory that contains the replica data.

For example, the replica might be stored at:

/var/lib/longhorn/replicas/pvc-06b4a8a8-b51d-42c6-a8cc-d8c8d6bc65bc-d890efb2
4. Run a checksum for every file under that directory.

For example:

```
# sha512sum /var/lib/longhorn/replicas/pvc-06b4a8a8-b51d-42c6-a8cc-d8c8d6bc65bc-d890efb2/*
fcd1b3bb677f63f58a61adcff8df82d0d69b669b36105fc4f39b0baf9aa46ba17bd47a7595336295ef807769a12583d06a8efb6562c093574be7d14ea4d6e5f4 /var/lib/longhorn/replicas/pvc-06b4a8a8-b51d-42c6-a8cc-d8c8d6bc65bc-d890efb2/revision.counter
c53649bf4ad843dd339d9667b912f51e0a0bb14953ccdc9431f41d46c85301dff4a021a50a0bf431a931a43b16ede5b71057ccadad6cf37a54b2537e696f4780 /var/lib/longhorn/replicas/pvc-06b4a8a8-b51d-42c6-a8cc-d8c8d6bc65bc-d890efb2/volume-head-000.img
f6cd5e486c88cb66c143913149d55f23e6179701f1b896a1526717402b976ed2ea68fc969caeb120845f016275e0a9a5b319950ae5449837e578665e2ffa82d0 /var/lib/longhorn/replicas/pvc-06b4a8a8-b51d-42c6-a8cc-d8c8d6bc65bc-d890efb2/volume-head-000.img.meta
e6f6e97a14214aca809a842d42e4319f4623adb8f164f7836e07dc8a3f4816a0389b67c45f7b0d9f833d50a731ae6c4670ba1956833f1feb974d2d12421b03f7 /var/lib/longhorn/replicas/pvc-06b4a8a8-b51d-42c6-a8cc-d8c8d6bc65bc-d890efb2/volume.meta
```

5. Compare the output of each replica. One of them should fail or have different results compared to the others. This will be the one replica we need to remove from the volume.
6. Use the Longhorn UI to remove the identified replica from the volume.
7. Scale up the workload to make sure the error is gone.
38 changes: 38 additions & 0 deletions content/docs/1.5.1/advanced-resources/data-recovery/data-error.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
title: Identifying and Recovering from Data Errors
weight: 1
---

If you've encountered an error message like the following:

'fsck' found errors on device /dev/longhorn/pvc-6288f5ea-5eea-4524-a84f-afa14b85780d but could not correct them.

Then you have a data corruption situation. This section describes how to address the issue.

## Bad Underlying Disk

To determine if the error is caused because one of the underlying disks went bad, follow [these steps](../corrupted-replica) to identify corrupted replicas.

If most of the replicas on the disk went bad, that means the disk is unreliable now and should be replaced.

If only one replica on the disk went bad, it can be a situation known as `bit rot`. In this case, removing the replica is good enough.

## Recover from a Snapshot

If all the replicas are identical, then the volume needs to be recovered using snapshots.

The reason for this is probably that the bad bit was written from the workload the volume attached to.

To revert to a previous snapshot:

1. In maintenance mode, attach the volume to any node.
2. Revert to a snapshot. You should start with the latest one.
3. Detach the volume from maintenance mode to any node.
4. Re-attach the volume to a node you have access to.
5. Mount the volume from `/dev/longhorn/<volume_name>` and check the volume content.
6. If the volume content is still incorrect, repeat from step 1.
7. Once you find a usable snapshot, make a new snapshot from there and start using the volume as normal.

## Recover from Backup

If all of the methods above failed, use a backup to [recover the volume.](../../../snapshots-and-backups/backup-and-restore/restore-from-a-backup)
Loading

0 comments on commit 026effa

Please sign in to comment.