feat(linstor): parallelize SR.scan/cleanup (vhd-util calls) #115

Wescoeur · 2025-11-29T23:35:19Z

Simplify scan logic removing XAPI calls.
Create one XAPI session / LINSTOR connection in each worker thread.

Wescoeur · 2025-11-29T23:35:27Z

This code is clearly not perfect or clean. We need to discuss it if we want to go with this approach. I tried to do this quickly because we're going to need it. I've probably left some errors, and we'll have to test it thoroughly.

drivers/util.py

drivers/LinstorSR.py

drivers/cleanup.py

drivers/LinstorSR.py

This was a patch added to the sm RPM git repo before we had this forked git repo for sm in the xcp-ng github organisation.

Originally-by: Ronan Abhamon <ronan.abhamon@vates.fr> This version obtained through merge in ff1bf65: git restore -SW -s ydi/forks/2.30.7/xfs drivers/EXTSR.py mv drivers/EXTSR.py drivers/XFSSR.py git restore -SW drivers/EXTSR.py Signed-off-by: Yann Dirson <yann.dirson@vates.fr>

…p#401)

Some important points: - linstor.KV must use an identifier name that starts with a letter (so it uses a "sr-" prefix). - Encrypted VDI are supported with key_hash attribute (not tested, experimental). - When a new LINSTOR volume is created on a host (via snapshot or create), the remaining diskless devices are not necessarily created on other hosts. So if a resource definition exists without local device path, we ask it to LINSTOR. Wait 5s for symlink creation when a new volume is created => 5s is is purely arbitrary, but this guarantees that we do not try to access the volume if the symlink has not yet been created by the udev rule. - Can change the provisioning using the device config 'provisioning' param. - We can only increase volume size (See: LINBIT/linstor-server#66), it would be great if we could shrink volumes to limit the space used by the snapshots. - Inflate/Deflate can only be executed on the master host, a linstor-manager plugin is present to do this from slaves. The same plugin is used to open LINSTOR ports + start controller. - Use a `total_allocated_volume_size` method to have a good idea of the reserved memory Why? Because `physical_free_size` is computed using the LVM used size, in the case of thick provisioning it's ok, but when thin provisioning is choosen LVM returns only the allocated size using the used block count. So this method solves this problem, it takes the fixed virtual volume size of each node to compute the required size to store the volume data. - Call vhd-util on remote hosts using the linstor-manager when necessary, i.e. vhd-util is called to get vhd info, the DRBD device can be in use (and unusable by external processes), so we must use the local LVM device that contains the DRBD data or a remote disk if the DRBD device is diskless. - If a DRBD device is in use when vhdutil.getVHDInfo is called, we must have no errors. So a LinstorVhdUtil wrapper is now used to bypass DRBD layer when VDIs are loaded. - Refresh PhyLink when unpause in called on DRBD devices: We must always recreate the symlink to ensure we have the right info. Why? Because if the volume UUID is changed in LINSTOR the symlink is not directly updated. When live leaf coalesce is executed we have these steps: "A" -> "OLD_A" "B" -> "A" Without symlink update the previous "A" path is reused instead of "B" path. Note: "A", "B" and "OLD_A" are UUIDs. - Since linstor python modules are not present on every XCP-ng host, module imports are protected by try.. except... blocks. - Provide a linstor-monitor daemon to check master changes

- Check if "create" doesn't succeed without zfs packages - Check if "scan" failed if the path is not mounted (not a ZFS mountpoint)

Co-authored-by: Piotr Robert Konopelko <piotr.konopelko@moosefs.pro> Signed-off-by: Aleksander Wieliczko <aleksander.wieliczko@moosefs.pro> Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>

`umount` should not be called when `legacy_mode` is enabled, otherwise a mounted dir used during SR creation is unmounted at the end of the `create` call (and also when a PBD is unplugged) in `detach` block. Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>

A sm-config boolean param `subdir` is available to configure where to store the VHDs: - In a subdir with the SR UUID, the new behavior - In the root directory of the MooseFS SR By default, new SRs are created with `subdir` = True. Existing SRs are not modified and continue to use the folder that was given at SR creation, directly, without looking for a subdirectory. Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>

Ensure all shared drivers are imported in `_is_open` definition to register them in the driver list. Otherwise this function always fails with a SRUnknownType exception. Also, we must add two fake mandatory parameters to make MooseFS happy: `masterhost` and `rootpath`. Same for CephFS with: `serverpath`. (NFS driver is directly patched to ensure there is no usage of the `serverpath` param because its value is equal to None.) `location` param is required to use ZFS, to be more precise, in the parent class: `FileSR`. Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>

SR_CACHING offers the capacity to use IntelliCache, but this feature is only available using NFS SR. For more details, the implementation of `_setup_cache` in blktap2.py uses only an instance of NFSFileVDI for the shared target. Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>

* `except` syntax fixes * drop `has_key()` usage * drop `filter()` usage (but drop their silly `list(x.keys())` wrappings) * drop `map()` usage * use `int` not `long` * use `items()` not `iteritems()` Signed-off-by: Yann Dirson <yann.dirson@vates.fr>

…store Signed-off-by: Yann Dirson <yann.dirson@vates.fr>

Guided by futurize's "old_div" use Signed-off-by: Yann Dirson <yann.dirson@vates.fr>

PROBE_MOUNTPOINT in a some drivers is a relative path, which is resolved using MOUNT_BASE at probe time, but CephFS, GlusterFS and MooseFS it is set on driver load to an absolute path, and this requires MOUNT_BASE to be looking like a path component. ``` drivers/CephFSSR.py:69: in <module> PROBE_MOUNTPOINT = os.path.join(SR.MOUNT_BASE, "probe") _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = <MagicMock name='mock.MOUNT_BASE' id='140396863897728'>, p = ('probe',) def join(a, *p): """Join two or more pathname components, inserting '/' as needed. If any component is an absolute path, all previous path components will be discarded. An empty last part will result in a path that ends with a separator.""" > a = os.fspath(a) E TypeError: expected str, bytes or os.PathLike object, not MagicMock /usr/lib64/python3.6/posixpath.py:80: TypeError ``` Note this same idiom is also used in upstream SMBFS, although that does not appear to cause any problem with the tests. Signed-off-by: Yann Dirson <yann.dirson@vates.fr>

(coverage 7.2.5) Without these changes many warns/errors are emitted: - "assertEquals" is deprecated, "assertEqual" must be used instead - mocked objects in "setUp" method like "cleanup.IPCFlag" cannot be repatched at the level of the test functions, otherwise tests are aborted, this is the behavior of coverage version 7.2.5 Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>

The probe method is not implemented so we shouldn't advertise it. Signed-off-by: BenjiReis <benjamin.reis@vates.fr>

Impacted drivers: LINSTOR, MooseFS and ZFS. - Ignore all linstor.* members during coverage, the module is not installed in github runner. - Use mock from unittest, the old one is not found now. - Remove useless return from LinstorSR scan method. Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>

This bug is minor but it makes it difficult to understand why snapshots fail since the initial trace is lost due to the exception caused by the reference to the non-existing variable "e". Signed-off-by: Damien Thenot <damien.thenot@vates.tech>

The GC can be interrupted by a SIGTERM signal. If this is caught while modifying a volume's hidden flag, this can have bad consequences. For example in the situation below, the hidden flag of a volume has been changed but the cached value (self.hidden) in the python process still has the old value because of the 'util.CommandException' exception that was thrown. A VDI that normally should not be hidden is still hidden after executing `_undoInterruptedCoalesceLeaf` because the hidden value was not the correct one. Code: ``` def _setHidden(self, hidden=True): vhdutil.setHidden(self.path, hidden) # Exception! Next line is never executed. self.hidden = hidden ``` Trace: ``` Jun 5 09:15:50 r620-q6 SMGC: [563219] Removed vhd-parent from dce4b0fc(2.000G/170.336M?) Jun 5 09:15:50 r620-q6 SMGC: [563219] Removed vhd-blocks from dce4b0fc(2.000G/170.336M?) Jun 5 09:15:50 r620-q6 SM: [563219] ['/usr/bin/vhd-util', 'set', '--debug', '-n', '/var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/OLD_dce4b0fc-6ad1-4750-857b-45d8d2758503.vhd', '-f', 'hidden', '-v', '1'] Jun 5 09:15:50 r620-q6 SM: [563219] GC: recieved SIGTERM Jun 5 09:15:50 r620-q6 SM: [563219] FAILED in util.pread: (rc -15) stdout: '', stderr: '' Jun 5 09:15:50 r620-q6 SMGC: [563219] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~* Jun 5 09:15:50 r620-q6 SMGC: [563219] *********************** Jun 5 09:15:50 r620-q6 SMGC: [563219] * E X C E P T I O N * Jun 5 09:15:50 r620-q6 SMGC: [563219] *********************** Jun 5 09:15:50 r620-q6 SMGC: [563219] _doCoalesceLeaf: EXCEPTION <class 'util.CommandException'>, Signalled 15 Jun 5 09:15:50 r620-q6 SMGC: [563219] File "/opt/xensource/sm/cleanup.py", line 2653, in _liveLeafCoalesce Jun 5 09:15:50 r620-q6 SMGC: [563219] self._doCoalesceLeaf(vdi) Jun 5 09:15:50 r620-q6 SMGC: [563219] File "/opt/xensource/sm/cleanup.py", line 2717, in _doCoalesceLeaf Jun 5 09:15:50 r620-q6 SMGC: [563219] vdi._setHidden(True) Jun 5 09:15:50 r620-q6 SMGC: [563219] File "/opt/xensource/sm/cleanup.py", line 1063, in _setHidden Jun 5 09:15:50 r620-q6 SMGC: [563219] vhdutil.setHidden(self.path, hidden) Jun 5 09:15:50 r620-q6 SMGC: [563219] File "/opt/xensource/sm/vhdutil.py", line 235, in setHidden Jun 5 09:15:50 r620-q6 SMGC: [563219] ret = ioretry(cmd) Jun 5 09:15:50 r620-q6 SMGC: [563219] File "/opt/xensource/sm/vhdutil.py", line 94, in ioretry Jun 5 09:15:50 r620-q6 SMGC: [563219] errlist=[errno.EIO, errno.EAGAIN]) Jun 5 09:15:50 r620-q6 SMGC: [563219] File "/opt/xensource/sm/util.py", line 347, in ioretry Jun 5 09:15:50 r620-q6 SMGC: [563219] return f() Jun 5 09:15:50 r620-q6 SMGC: [563219] File "/opt/xensource/sm/vhdutil.py", line 93, in <lambda> Jun 5 09:15:50 r620-q6 SMGC: [563219] return util.ioretry(lambda: util.pread2(cmd, text=text), Jun 5 09:15:50 r620-q6 SMGC: [563219] File "/opt/xensource/sm/util.py", line 255, in pread2 Jun 5 09:15:50 r620-q6 SMGC: [563219] return pread(cmdlist, quiet=quiet, text=text) Jun 5 09:15:50 r620-q6 SMGC: [563219] File "/opt/xensource/sm/util.py", line 217, in pread Jun 5 09:15:50 r620-q6 SMGC: [563219] raise CommandException(rc, str(cmdlist), stderr.strip()) Jun 5 09:15:50 r620-q6 SMGC: [563219] Jun 5 09:15:50 r620-q6 SMGC: [563219] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~* Jun 5 09:15:50 r620-q6 SMGC: [563219] *** UNDO LEAF-COALESCE Jun 5 09:15:50 r620-q6 SMGC: [563219] Renaming parent back: dce4b0fc-6ad1-4750-857b-45d8d2758503 -> 056b6f93-66ff-460a-9354-157540b584a8 Jun 5 09:15:50 r620-q6 SMGC: [563219] Renaming /var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/dce4b0fc-6ad1-4750-857b-45d8d2758503.vhd -> /var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/056b6f93-66ff-460a-9354-157540b584a8.vhd Jun 5 09:15:50 r620-q6 SMGC: [563219] Renaming child back to dce4b0fc-6ad1-4750-857b-45d8d2758503 Jun 5 09:15:50 r620-q6 SMGC: [563219] Renaming /var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/OLD_dce4b0fc-6ad1-4750-857b-45d8d2758503.vhd -> /var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/dce4b0fc-6ad1-4750-857b-45d8d2758503.vhd Jun 5 09:15:50 r620-q6 SMGC: [563219] Updating the VDI record Jun 5 09:15:50 r620-q6 SMGC: [563219] Set vhd-parent = 056b6f93-66ff-460a-9354-157540b584a8 for dce4b0fc(2.000G/8.500K?) Jun 5 09:15:50 r620-q6 SMGC: [563219] Set vdi_type = vhd for dce4b0fc(2.000G/8.500K?) Jun 5 09:15:50 r620-q6 SM: [563219] ['/usr/bin/vhd-util', 'set', '--debug', '-n', '/var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/056b6f93-66ff-460a-9354-157540b584a8.vhd', '-f', 'hidden', '-v', '1'] Jun 5 09:15:50 r620-q6 SM: [563219] pread SUCCESS Jun 5 09:15:50 r620-q6 SMGC: [563219] *** leaf-coalesce undo successful ``` Therefore, a VDI impacted by this problem remains hidden and can no longer be used correctly without manual intervention: ``` Jun 5 09:16:29 r620-q6 SM: [566174] lock: released /var/lock/sm/f816795d-e7a9-43df-170c-23bc329607fc/sr Jun 5 09:16:29 r620-q6 SM: [566174] ***** generic exception: vdi_clone: EXCEPTION <class 'xs_errors.SROSError'>, Failed to clone VDI [opterr=hidden VDI] Jun 5 09:16:29 r620-q6 SM: [566174] File "/opt/xensource/sm/SRCommand.py", line 113, in run Jun 5 09:16:29 r620-q6 SM: [566174] return self._run_locked(sr) Jun 5 09:16:29 r620-q6 SM: [566174] File "/opt/xensource/sm/SRCommand.py", line 163, in _run_locked Jun 5 09:16:29 r620-q6 SM: [566174] rv = self._run(sr, target) Jun 5 09:16:29 r620-q6 SM: [566174] File "/opt/xensource/sm/SRCommand.py", line 270, in _run Jun 5 09:16:29 r620-q6 SM: [566174] return target.clone(self.params['sr_uuid'], self.vdi_uuid) Jun 5 09:16:29 r620-q6 SM: [566174] File "/opt/xensource/sm/FileSR.py", line 704, in clone Jun 5 09:16:29 r620-q6 SM: [566174] return self._do_snapshot(sr_uuid, vdi_uuid, VDI.SNAPSHOT_DOUBLE) Jun 5 09:16:29 r620-q6 SM: [566174] File "/opt/xensource/sm/FileSR.py", line 754, in _do_snapshot Jun 5 09:16:29 r620-q6 SM: [566174] return self._snapshot(snapType, cbtlog, consistency_state) Jun 5 09:16:29 r620-q6 SM: [566174] File "/opt/xensource/sm/FileSR.py", line 797, in _snapshot Jun 5 09:16:29 r620-q6 SM: [566174] raise xs_errors.XenError('VDIClone', opterr='hidden VDI') Jun 5 09:16:29 r620-q6 SM: [566174] ``` Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>

In the event of a network outage on a LINSTOR host where the controller is running, a rather problematic situation can occur: the `/var/lib/linstor` folder may remain mounted (in RO mode) while `xcp-persistent-database` has become PRIMARY on another machine. This situation occurs following a kernel freeze lasting several minutes of jbd2/ext4fs. Trace of the temporary blockage: ``` Jul 8 15:05:39 xcp-ng-ha-1 kernel: [98867.434915] r8125: eth2: link down Jul 8 15:06:03 xcp-ng-ha-1 kernel: [98890.897922] r8125: eth2: link up Jul 8 15:09:13 xcp-ng-ha-1 kernel: [99081.001306] INFO: task jbd2/drbd1000-8:736989 blocked for more than 120 seconds. Jul 8 15:09:13 xcp-ng-ha-1 kernel: [99081.001314] Tainted: G O 4.19.0+1 #1 Jul 8 15:09:13 xcp-ng-ha-1 kernel: [99081.001316] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 8 15:09:13 xcp-ng-ha-1 kernel: [99081.001319] jbd2/drbd1000-8 D 0 736989 2 0x80000000 Jul 8 15:09:13 xcp-ng-ha-1 kernel: [99081.001321] Call Trace: Jul 8 15:09:13 xcp-ng-ha-1 kernel: [99081.001330] ? __schedule+0x2a6/0x880 Jul 8 15:09:13 xcp-ng-ha-1 kernel: [99081.001331] schedule+0x32/0x80 Jul 8 15:09:13 xcp-ng-ha-1 kernel: [99081.001334] jbd2_journal_commit_transaction+0x260/0x1896 Jul 8 15:09:13 xcp-ng-ha-1 kernel: [99081.001336] ? __switch_to_asm+0x34/0x70 Jul 8 15:09:13 xcp-ng-ha-1 kernel: [99081.001337] ? __switch_to_asm+0x40/0x70 Jul 8 15:09:13 xcp-ng-ha-1 kernel: [99081.001338] ? __switch_to_asm+0x34/0x70 Jul 8 15:09:13 xcp-ng-ha-1 kernel: [99081.001339] ? __switch_to_asm+0x40/0x70 Jul 8 15:09:13 xcp-ng-ha-1 kernel: [99081.001340] ? __switch_to_asm+0x34/0x70 Jul 8 15:09:13 xcp-ng-ha-1 kernel: [99081.001341] ? __switch_to_asm+0x40/0x70 Jul 8 15:09:13 xcp-ng-ha-1 kernel: [99081.001342] ? __switch_to_asm+0x34/0x70 Jul 8 15:09:13 xcp-ng-ha-1 kernel: [99081.001343] ? __switch_to_asm+0x40/0x70 Jul 8 15:09:13 xcp-ng-ha-1 kernel: [99081.001346] ? wait_woken+0x80/0x80 Jul 8 15:09:13 xcp-ng-ha-1 kernel: [99081.001348] ? try_to_del_timer_sync+0x4d/0x80 Jul 8 15:09:13 xcp-ng-ha-1 kernel: [99081.001350] kjournald2+0xc1/0x260 Jul 8 15:09:13 xcp-ng-ha-1 kernel: [99081.001351] ? wait_woken+0x80/0x80 Jul 8 15:09:13 xcp-ng-ha-1 kernel: [99081.001353] kthread+0xf8/0x130 Jul 8 15:09:13 xcp-ng-ha-1 kernel: [99081.001355] ? commit_timeout+0x10/0x10 Jul 8 15:09:13 xcp-ng-ha-1 kernel: [99081.001356] ? kthread_bind+0x10/0x10 Jul 8 15:09:13 xcp-ng-ha-1 kernel: [99081.001357] ret_from_fork+0x22/0x40 Jul 8 15:11:14 xcp-ng-ha-1 kernel: [99201.830064] INFO: task jbd2/drbd1000-8:736989 blocked for more than 120 seconds. Jul 8 15:11:14 xcp-ng-ha-1 kernel: [99201.830071] Tainted: G O 4.19.0+1 #1 Jul 8 15:11:14 xcp-ng-ha-1 kernel: [99201.830074] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 8 15:11:14 xcp-ng-ha-1 kernel: [99201.830076] jbd2/drbd1000-8 D 0 736989 2 0x80000000 Jul 8 15:11:14 xcp-ng-ha-1 kernel: [99201.830078] Call Trace: Jul 8 15:11:14 xcp-ng-ha-1 kernel: [99201.830086] ? __schedule+0x2a6/0x880 Jul 8 15:11:14 xcp-ng-ha-1 kernel: [99201.830088] schedule+0x32/0x80 Jul 8 15:11:14 xcp-ng-ha-1 kernel: [99201.830091] jbd2_journal_commit_transaction+0x260/0x1896 Jul 8 15:11:14 xcp-ng-ha-1 kernel: [99201.830093] ? __switch_to_asm+0x34/0x70 Jul 8 15:11:14 xcp-ng-ha-1 kernel: [99201.830094] ? __switch_to_asm+0x40/0x70 Jul 8 15:11:14 xcp-ng-ha-1 kernel: [99201.830095] ? __switch_to_asm+0x34/0x70 Jul 8 15:11:14 xcp-ng-ha-1 kernel: [99201.830096] ? __switch_to_asm+0x40/0x70 Jul 8 15:11:14 xcp-ng-ha-1 kernel: [99201.830097] ? __switch_to_asm+0x34/0x70 Jul 8 15:11:14 xcp-ng-ha-1 kernel: [99201.830098] ? __switch_to_asm+0x40/0x70 Jul 8 15:11:14 xcp-ng-ha-1 kernel: [99201.830099] ? __switch_to_asm+0x34/0x70 Jul 8 15:11:14 xcp-ng-ha-1 kernel: [99201.830100] ? __switch_to_asm+0x40/0x70 Jul 8 15:11:14 xcp-ng-ha-1 kernel: [99201.830103] ? wait_woken+0x80/0x80 Jul 8 15:11:14 xcp-ng-ha-1 kernel: [99201.830105] ? try_to_del_timer_sync+0x4d/0x80 Jul 8 15:11:14 xcp-ng-ha-1 kernel: [99201.830107] kjournald2+0xc1/0x260 Jul 8 15:11:14 xcp-ng-ha-1 kernel: [99201.830108] ? wait_woken+0x80/0x80 Jul 8 15:11:14 xcp-ng-ha-1 kernel: [99201.830110] kthread+0xf8/0x130 Jul 8 15:11:14 xcp-ng-ha-1 kernel: [99201.830112] ? commit_timeout+0x10/0x10 Jul 8 15:11:14 xcp-ng-ha-1 kernel: [99201.830113] ? kthread_bind+0x10/0x10 Jul 8 15:11:14 xcp-ng-ha-1 kernel: [99201.830114] ret_from_fork+0x22/0x40 Jul 8 15:11:51 xcp-ng-ha-1 kernel: [99238.731530] drbd_reject_write_early: 2 callbacks suppressed Jul 8 15:11:51 xcp-ng-ha-1 kernel: [99238.731541] Aborting journal on device drbd1000-8. Jul 8 15:11:51 xcp-ng-ha-1 kernel: [99238.731544] Buffer I/O error on dev drbd1000, logical block 131072, lost sync page write Jul 8 15:11:51 xcp-ng-ha-1 kernel: [99238.731546] JBD2: Error -5 detected when updating journal superblock for drbd1000-8. Jul 8 15:11:51 xcp-ng-ha-1 kernel: [99238.731549] EXT4-fs error (device drbd1000) in ext4_reserve_inode_write:5872: Journal has aborted Jul 8 15:11:51 xcp-ng-ha-1 kernel: [99238.731556] Buffer I/O error on dev drbd1000, logical block 0, lost sync page write Jul 8 15:11:51 xcp-ng-ha-1 kernel: [99238.731562] EXT4-fs (drbd1000): I/O error while writing superblock Jul 8 15:11:51 xcp-ng-ha-1 kernel: [99238.731565] EXT4-fs error (device drbd1000) in ext4_orphan_add:2822: Journal has aborted Jul 8 15:11:51 xcp-ng-ha-1 kernel: [99238.731569] Buffer I/O error on dev drbd1000, logical block 0, lost sync page write Jul 8 15:11:51 xcp-ng-ha-1 kernel: [99238.731571] EXT4-fs (drbd1000): I/O error while writing superblock Jul 8 15:11:51 xcp-ng-ha-1 kernel: [99238.731575] EXT4-fs error (device drbd1000) in ext4_reserve_inode_write:5872: Journal has aborted Jul 8 15:11:51 xcp-ng-ha-1 kernel: [99238.731578] Buffer I/O error on dev drbd1000, logical block 0, lost sync page write Jul 8 15:11:51 xcp-ng-ha-1 kernel: [99238.731581] EXT4-fs (drbd1000): I/O error while writing superblock Jul 8 15:11:51 xcp-ng-ha-1 kernel: [99238.731586] EXT4-fs error (device drbd1000) in ext4_truncate:4527: Journal has aborted Jul 8 15:11:51 xcp-ng-ha-1 kernel: [99238.731589] Buffer I/O error on dev drbd1000, logical block 0, lost sync page write Jul 8 15:11:51 xcp-ng-ha-1 kernel: [99238.731592] EXT4-fs (drbd1000): I/O error while writing superblock ``` On the drbd-monitor side, here's what happens: we failed to stop the controller, and it was subsequently killed by systemd. Then an attempt to unmount `/var/lib/linstor` failed completely: ``` Jul 8 15:10:15 xcp-ng-ha-1 systemd[1]: linstor-controller.service stop-final-sigterm timed out. Killing. Jul 8 15:11:45 xcp-ng-ha-1 systemd[1]: linstor-controller.service still around after final SIGKILL. Entering failed mode. Jul 8 15:11:45 xcp-ng-ha-1 systemd[1]: Stopped drbd-reactor controlled linstor-controller. Jul 8 15:11:45 xcp-ng-ha-1 systemd[1]: Unit linstor-controller.service entered failed state. Jul 8 15:11:45 xcp-ng-ha-1 systemd[1]: linstor-controller.service failed. Jul 8 15:11:45 xcp-ng-ha-1 systemd[1]: Stopping drbd-reactor controlled var-lib-linstor... Jul 8 15:11:48 xcp-ng-ha-1 Satellite[739516]: 2025-07-08 15:11:48.312 [MainWorkerPool-8] INFO LINSTOR/Satellite/000010 SYSTEM - SpaceInfo: DfltDisklessStorPool -> 9223372036854775807/9223372036854775807 Jul 8 15:11:48 xcp-ng-ha-1 Satellite[739516]: 2025-07-08 15:11:48.447 [MainWorkerPool-8] INFO LINSTOR/Satellite/000010 SYSTEM - SpaceInfo: xcp-sr-linstor_group_thin_device -> 430950298/444645376 Jul 8 15:11:51 xcp-ng-ha-1 systemd[1]: var-lib-linstor.service: control process exited, code=exited status=32 Jul 8 15:11:51 xcp-ng-ha-1 systemd[1]: Stopped drbd-reactor controlled var-lib-linstor. Jul 8 15:11:51 xcp-ng-ha-1 systemd[1]: Unit var-lib-linstor.service entered failed state. Jul 8 15:11:51 xcp-ng-ha-1 systemd[1]: var-lib-linstor.service failed. Jul 8 15:11:51 xcp-ng-ha-1 systemd[1]: Stopping Promotion of DRBD resource xcp-persistent-database... ``` In this situation: the host will not be able to run the controller again without manually unmounting `/var/lib/linstor`. The solution to this problem is to attempt a `umount` call with the lazy option. This option can be dangerous in many situations, but here we don't have much choice: - The DRBD resource is technically no longer PRIMARY and therefore no longer accessible - The controller has been stopped - No writing is possible Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>

Try to use host_OpaqueRef to access primary then try on the master host if it doesn't work, then find the primary with linstor API or if no primary, any other host. Signed-off-by: Damien Thenot <damien.thenot@vates.tech> Co-authored-by: Ronan Abhamon <ronan.abhamon@vates.fr>

Signed-off-by: Damien Thenot <damien.thenot@vates.tech>

fix(linstor): prevent use of e before assignment in nested try-except fix(linstor): use util.get_master_ref to get the master ref fix(linstor): log host_ref instead UUID to prevent XAPI call fix(log_failed_call): set error value for the call without an actual error fix(linstorhostcall): use next iter instead of list conversion cleanup(linstor): remove currently unused get_primary function Signed-off-by: Mathieu Labourier <mathieu.labourier@vates.tech> Co-authored-by: Damien Thenot <damien.thenot@vates.tech> Co-authored-by: Ronan Abhamon <ronan.abhamon@vates.tech>

Upstream patch of ae10349 is incorrect. All "@mock.patch('blktap2.VDI.PhyLink', autospec=True)" lines must be removed because PhyLink is mocked globally. Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>

- Use specific DRBD options to detect failures in a small delay. - Use these options to control quorum with drbd-reactor. - Provide a better compromise in terms of availability. Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>

Impacted functions: `_get_volumes_info` and `_get_volume_node_names_and_size`. Before this change "usable_size" validity was checked too early and which could lead to an exception for no good reason while the size could be known on at least one host despite an issue on other machines. Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>

…ll context Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>

…mutators Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>

Session attr is not set during "attach/detach calls from config". In this context local method must always be called. Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>

A change in lvm2 `https://github.com/xcp-ng-rpms/lvm2/pull/3/files` introduces an issue in LargeBlockSR: `/dev/` is not scanned now meaning the loop device is never used for VG activation. So we must add a custom scan parameter to LVM commands. We also now systematically do the call to _redo_vg_connection to use our custom parameters to enable the LV on the correct device before calling `EXTSR.attach()`. Signed-off-by: Damien Thenot <damien.thenot@vates.tech>

This is not done on every and each implementation of SR but only on ones that calls cleanup.start_gc_service (like FileSR) and on the classes that inherits from them and don't call super on detach. This is to prevent useless errors logs like Failed to stop xxx.service: Unit xxx.service not loaded. Signed-off-by: Mathieu Labourier <mathieu.labourier@vates.tech>

When the pool master is changed and if it doesn't have a local DB path then `get_database_path` fails during SR.scan call. This patch allows creating a diskless path if necessary. Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>

…104) In `_request_device_path`: Before this change, an exception was thrown when a resource was missing, but not when the returned path was empty. Now it's raised in both cases. Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>

Add a way in `linstorvolumemanager` to verify that all nodes are using the same LINSTOR version at init. Raise an error early if this happens so that SR ops are properly disabled with clear feedback to the user. Signed-off-by: Antoine Bartuccio <antoine.bartuccio@vates.tech>

Avoid python version mismatch that pulls incompatible dependencies in github actions when running unittests. Signed-off-by: Antoine Bartuccio <antoine.bartuccio@vates.tech>

- Simplify scan logic removing XAPI calls. - Create one XAPI session / LINSTOR connection in each worker thread. Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>

Millefeuille42 · 2025-12-08T08:33:26Z

drivers/LinstorSR.py

+            all_executor_load.append(load)
+
+            session = XenAPI.xapi_local()
+            session.xenapi.login_with_password('root', '', '', 'SM')
+            load._session = session


Not sure to see why you mutate load after adding it to all_executor_load. IMHO it would read better if you did:

load._session = session all_executor_load.append(load)

Millefeuille42 · 2025-12-08T09:26:56Z

drivers/cleanup.py

+        def init_executor_thread():
+            class Load(object):
+                def __init__(self):
+                    self._session = None

+                def cleanup(self):
+                    if self._session:
+                        self._session.xenapi.session.logout()
+
+            load = Load()
+
+            all_executor_load.append(load)
+
+            session = XenAPI.xapi_local()
+            session.xenapi.login_with_password('root', '', '', 'SM')
+            load._session = session
+            executor_data.session = session
+
+            linstor = LinstorVolumeManager(
+                self._linstor.uri,
+                self._linstor.group_name,
+                repair=False,
+                logger=Util.log
+            )
+            executor_data.linstor = linstor
+
+            executor_data.vhdutil = LinstorVhdUtil(session, linstor)


Most of this code seems to be repeated. Would it be possible to make it a util?

Both the executor_data and all_executor_load could be passed as an argument, maybe along with an additional function to do some specific code (like the Linstor part)

Millefeuille42 · 2025-12-08T09:27:33Z

drivers/cleanup.py

+            all_executor_load.append(load)
+
+            session = XenAPI.xapi_local()
+            session.xenapi.login_with_password('root', '', '', 'SM')
+            load._session = session


Same as below, I'd prefer to not mutate load after appending it.

Millefeuille42 · 2025-12-08T09:29:00Z

drivers/cleanup.py

+        try:
+            with ThreadPoolExecutor() as executor:
+                for info in executor.map(load_info, pending_vdi_uuids):
+                    all_vdi_info[info.uuid] = info
+        finally:
+            for load in all_executor_load:
+                try:
+                    load.cleanup()
+                except Exception as e:
+                    Util.log(f"Failed to clean load executor: {e}")
+            all_executor_load.clear()


This is also repeated twice. I think it could be make generic at some point.

drivers/LinstorSR.py

Millefeuille42 · 2025-12-08T14:17:55Z

drivers/linstorvolumemanager.py

-    # Make sure this call never stucks because this function can be called
-    # during HA init and in this case we can wait forever.
-    session = util.timeout_call(10, util.get_localAPI_session)
+    session = util.get_localAPI_session()


Could you please explain in comments or at least in the PR why you drop the timeout?

Wescoeur requested review from Millefeuille42, Nambrok and klmp200 November 29, 2025 23:35

klmp200 requested changes Dec 1, 2025

View reviewed changes

drivers/util.py Show resolved Hide resolved

drivers/LinstorSR.py Outdated Show resolved Hide resolved

drivers/LinstorSR.py Outdated Show resolved Hide resolved

drivers/LinstorSR.py Outdated Show resolved Hide resolved

drivers/cleanup.py Show resolved Hide resolved

Wescoeur requested a review from klmp200 December 1, 2025 12:34

klmp200 reviewed Dec 1, 2025

View reviewed changes

drivers/LinstorSR.py Show resolved Hide resolved

Wescoeur force-pushed the 3.2.12-8.3 branch from d55fdf0 to 66c7c8e Compare December 1, 2025 23:04

stormi and others added 22 commits December 2, 2025 00:23

Update xs-sm.service's description for XCP-ng

d3b23e3

This was a patch added to the sm RPM git repo before we had this forked git repo for sm in the xcp-ng github organisation.

feat(drivers): add CephFS and GlusterFS drivers

91a040e

feat(drivers): add ZFS driver to avoid losing VDI metadata (xcp-ng/xc…

fb7fbe5

…p#401)

feat(tests): add unit tests concerning ZFS (close xcp-ng/xcp#425)

f4f856b

- Check if "create" doesn't succeed without zfs packages - Check if "scan" failed if the path is not mounted (not a ZFS mountpoint)

Added SM Driver for MooseFS

97faf05

Co-authored-by: Piotr Robert Konopelko <piotr.konopelko@moosefs.pro> Signed-off-by: Aleksander Wieliczko <aleksander.wieliczko@moosefs.pro> Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>

Fix code coverage regarding MooseFSSR and ZFSSR (#29)

ebb0bb0

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>

py3: futurize fix of xmlrpc calls for CephFS, GlusterFS, MooseFS, Lin…

a4d31c6

…store Signed-off-by: Yann Dirson <yann.dirson@vates.fr>

py3: use of integer division operator

aa996bf

Guided by futurize's "old_div" use Signed-off-by: Yann Dirson <yann.dirson@vates.fr>

py3: switch interpreter to python3

2cf5adb

feat(LinstorSR): import all 8.2 changes

ce6f96d

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>

feat(LinstorSR): is now compatible with python 3

d606dff

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>

Remove SR_PROBE from ZFS capabilities (#36)

023d194

The probe method is not implemented so we shouldn't advertise it. Signed-off-by: BenjiReis <benjamin.reis@vates.fr>

Nambrok and others added 22 commits December 2, 2025 00:23

feat(linstor): Add new debug log in linstorhostcall (#67)

aa4f221

Signed-off-by: Damien Thenot <damien.thenot@vates.tech>

Fix tests of "CP-51843: add unit tests for setup_cache"

9488bec

Upstream patch of ae10349 is incorrect. All "@mock.patch('blktap2.VDI.PhyLink', autospec=True)" lines must be removed because PhyLink is mocked globally. Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>

feat(LinstorSR): improve DB volume robustness

b2cab34

- Use specific DRBD options to detect failures in a small delay. - Use these options to control quorum with drbd-reactor. - Provide a better compromise in terms of availability. Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>

fix(linstorvhdutil): don't log when get_hosts_attached_on() is empty

9c1c2c5

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>

refactor(linstorvhdutil): simplify linstorhostcall

436f2e6

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>

fix(linstorvhdutil): call local method when possible in linstorhostca…

8c741a9

…ll context Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>

fix(linstorvhdutil.py): remove useless param on "call_remote_method"

52edb2c

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>

chore(pylintrc): get rid of E1120 errors for LINSTOR using signature-…

2c53138

…mutators Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>

fix(linstorvhdutil): support missing session (#97)

c705f76

Session attr is not set during "attach/detach calls from config". In this context local method must always be called. Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>

Remove useless loadLocked param on vdi methods (#98)

8f270f0

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>

chore(ci): build using python 3.8 (#111)

aaaab27

Avoid python version mismatch that pulls incompatible dependencies in github actions when running unittests. Signed-off-by: Antoine Bartuccio <antoine.bartuccio@vates.tech>

Wescoeur force-pushed the 3.2.12-8.3 branch from 66c7c8e to aaaab27 Compare December 1, 2025 23:24

Wescoeur added 3 commits December 2, 2025 00:39

feat(linstor): parallelize SR.scan/cleanup (vhd-util calls)

4ac4307

- Simplify scan logic removing XAPI calls. - Create one XAPI session / LINSTOR connection in each worker thread. Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>

TODO: get_all_volume_openers FIXME

7569314

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>

PR fixes + remove HA comment

a1eaa0e

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>

Wescoeur force-pushed the ran-xostor-fast-scan branch from 7fa711a to a1eaa0e Compare December 1, 2025 23:39

Millefeuille42 reviewed Dec 8, 2025

View reviewed changes

Wescoeur force-pushed the 3.2.12-8.3 branch from 2cbab06 to d7cd0b1 Compare December 9, 2025 18:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(linstor): parallelize SR.scan/cleanup (vhd-util calls) #115

feat(linstor): parallelize SR.scan/cleanup (vhd-util calls) #115

Uh oh!

Wescoeur commented Nov 29, 2025

Uh oh!

Wescoeur commented Nov 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Millefeuille42 Dec 8, 2025

Uh oh!

Millefeuille42 Dec 8, 2025

Uh oh!

Millefeuille42 Dec 8, 2025

Uh oh!

Millefeuille42 Dec 8, 2025

Uh oh!

Uh oh!

Millefeuille42 Dec 8, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

feat(linstor): parallelize SR.scan/cleanup (vhd-util calls) #115

Are you sure you want to change the base?

feat(linstor): parallelize SR.scan/cleanup (vhd-util calls) #115

Uh oh!

Conversation

Wescoeur commented Nov 29, 2025

Uh oh!

Wescoeur commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Millefeuille42 Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Millefeuille42 Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Millefeuille42 Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Millefeuille42 Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Millefeuille42 Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Wescoeur commented Nov 29, 2025 •

edited

Loading

Millefeuille42 Dec 8, 2025 •

edited

Loading