Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LXD fails to pickup non-pristine disks #142

Open
ethanmye-rs opened this issue Jul 20, 2023 · 16 comments
Open

LXD fails to pickup non-pristine disks #142

ethanmye-rs opened this issue Jul 20, 2023 · 16 comments
Labels
Feature New feature, not a bug

Comments

@ethanmye-rs
Copy link
Member

In the microcloud init screen, the wizard seems to fail to pickup non-pristine disks. It offers to wipe the disk in the next screen, so I assume this is a bug. If I wipe a non-pristine disk with:

sudo wipefs -a /dev/sdb && sudo dd if=/dev/zero of=/dev/sdb bs=4096 count=100 > /dev/null

then microcloud picks up the disk next time the wizard is run.

@masnax
Copy link
Contributor

masnax commented Jul 20, 2023

At the moment, MicroCloud won't pick up any partitioned disks. That will definitely change in the near-ish future.

@masnax
Copy link
Contributor

masnax commented Nov 20, 2023

We're close to having support for partitions on local (zfs) storage, but it seems ceph might take a bit longer:
canonical/microceph#251

For ZFS, we'll be able to add partition support once canonical/lxd#12537 is merged in LXD.

@tomponline
Copy link
Member

@masnax WRT to canonical/lxd#12537 why do we need to ascertain if the partition is mounted, as isn't microcloud only showing empty partitions anyway?

@roosterfish roosterfish added the Feature New feature, not a bug label Nov 22, 2023
@dnebing
Copy link

dnebing commented Nov 23, 2023

Because I couldn't add partitions as local storage during the microcloud init command, I chose "no" when asked about adding local storage. Microcloud completed initialization completely and it all looks great.

Is there a command I can execute to manually create the local storage pool and add the partitions from the cluster nodes?

At least until this new feature is ready?

@masnax
Copy link
Contributor

masnax commented Nov 23, 2023

Because I couldn't add partitions as local storage during the microcloud init command, I chose "no" when asked about adding local storage. Microcloud completed initialization completely and it all looks great.

Is there a command I can execute to manually create the local storage pool and add the partitions from the cluster nodes?

At least until this new feature is ready?

Sure, to create a local zfs storage pool like MicroCloud would, you can do the following:

Once on each system:

lxc storage create local zfs source=${disk_path} --target ${cluster_member_name}

And finally, from any system:

lxc storage create local zfs

@dnebing
Copy link

dnebing commented Nov 23, 2023

Thanks for that, extremely helpful!

I noticed per the doco that there are default volumes (backups, images) tied to the target systems.

Are those required, or should I just skip them?

@masnax
Copy link
Contributor

masnax commented Nov 23, 2023

@masnax WRT to canonical/lxd#12537 why do we need to ascertain if the partition is mounted, as isn't microcloud only showing empty partitions anyway?

There's no way MicroCloud can know if the partitions are empty without LXD's super-privileges. So no it will list every single partition on the system. The list is ripped straight from lxd info --resources.

@tomponline
Copy link
Member

@masnax I commented over at canonical/lxd#12537 (review)

@tomponline
Copy link
Member

MicroCeph support for partitions is being tracked here canonical/microceph#251

@rmbleeker
Copy link

Sure, to create a local zfs storage pool like MicroCloud would, you can do the following:

Once on each system:

lxc storage create local zfs source=${disk_path} --target ${cluster_member_name}

And finally, from any system:

lxc storage create local zfs

When I follow these instructions to the letter, or even when I add sudo, I always get the same error:

Error: Failed to run: zpool create -m none -O compression=on local /dev/disk/by-id/mmc-BJTD4R_0x5edae852-part3: exit status 1 (invalid vdev specification
use '-f' to override the following errors:
/dev/disk/by-id/mmc-BJTD4R_0x5edae852-part3 is part of active pool 'local')

This is on a TuringPi 2 clusterboard with 4 Turing RK1 nodes (Rockchip 3588 based compute modules with 32GB of eMMC storage). The nodes were freshly imaged and the 3rd partition was newly created on all of them using parted before turning on the nodes, to prevent the second partition which is the root partition to grow to the full size of the eMMC storage. What am I missing?

@roosterfish
Copy link
Contributor

roosterfish commented Oct 22, 2024

What am I missing?

It looks there already is a storage pool called local which is using the disk /dev/disk/by-id/mmc-BJTD4R_0x5edae852-part3.

You can run zpool list on your system to verify this.

@rmbleeker Have you skipped local storage pool setup during microcloud init?

@rmbleeker
Copy link

It looks there already is a storage pool called local which is using the disk /dev/disk/by-id/mmc-BJTD4R_0x5edae852-part3.

You can run zpool list on your system to verify this.

I realize that's what it looks like, but it's not the case. zpool list came up empty (no pools available). In fact it still does since the storage pool is still pending and hasn't been created yet.

@rmbleeker Have you skipped local storage pool setup during microcloud init?

Yes I have.

@rmbleeker
Copy link

Alright, it seems to work when I pick a different approach and slightly alter the commands. I got the idea from the Web UI, which states that when creating a ZFS storage pool, the name of an existing ZFS pool is a valid source. So I created a storage pool with

sudo zpool create -f -m none -O compression=on local /dev/disk/by-id/mmc-BJTD4R_0x5edae852-part3

on each node, filling in the proper disk ID on each node. I then used

sudo lxc storage create local zfs source=local --target=${nodename}

to create the local storage, filling in the name of each node in the cluster as the target. Then finally

sudo lxc storage create local zfs

properly initialized the storage pool, giving it the CREATED stage instead of PENDING or ERRORED. It cost me an extra step which isn't a big deal, but it's still a work around and not a solution in my view.

@masnax
Copy link
Contributor

masnax commented Oct 22, 2024

Out of curiosity, if you have another partition you're able to test on, I'd be very interested to see if the storage pool can be created with a name other than local?

The setup that eventually worked for you seems to just ignore the existing pool error with the -f flag. It's not yet clear if this is an issue with existing zpool state or some race when creating the pool in LXD.

@rmbleeker
Copy link

There are no other disks or partitions available on the nodes, but since I wasn't far into my project anyway I decided to do some testing and flash the nodes again with a fresh image. I did this twice and set up the cluster again both times. After the first time I used the lxc storage create commands to create a ZFS storage pool with the partition as it's source, giving it the name local-zfs. This got me the same errors, leaving the storage pool in the ERRORED state. The second time I used zpool create to first create a pool named local-zfs on the partition, and then used the lxc commands to use that pool as a source for the storage pool. This worked without using the -f flag to force overriding an existing pool, except on node 2 where it claimed a pool named local already existed on the partition.

With all that said and done these tests weren' conclusive. The fact that the issue still occurred on node 2 after applying a fresh image leads me to believe that some remnants of the contents of a partition are left behind when you re-create the partition with exactly the same parameters if the storage device isn't properly overwritten beforehand. But apparently that's not always the case because I could create a new pool without forcing it on 3 of the 4 nodes.

In any case I think that perhaps a --force flag should be implemented for the lxc storage create command, which is then passed along to the underlying command that is used to create a storage pool, just so you can resolve errors like the one I ran into.

@roosterfish
Copy link
Contributor

In any case I think that perhaps a --force flag should be implemented for the lxc storage create command

You can already pass source.wipe=true when creating the storage pool to wipe the source before trying to create the pool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature New feature, not a bug
Projects
None yet
Development

No branches or pull requests

6 participants