Skip to content

Conversation

@Wescoeur
Copy link
Member

  • Simplify scan logic removing XAPI calls.
  • Create one XAPI session / LINSTOR connection in each worker thread.

@Wescoeur
Copy link
Member Author

Wescoeur commented Nov 29, 2025

This code is clearly not perfect or clean. We need to discuss it if we want to go with this approach. I tried to do this quickly because we're going to need it. I've probably left some errors, and we'll have to test it thoroughly.

@Wescoeur Wescoeur requested a review from klmp200 December 1, 2025 12:34
stormi and others added 22 commits December 2, 2025 00:23
This was a patch added to the sm RPM git repo before we had this
forked git repo for sm in the xcp-ng github organisation.
Originally-by: Ronan Abhamon <ronan.abhamon@vates.fr>

This version obtained through merge in
ff1bf65:

 git restore -SW -s ydi/forks/2.30.7/xfs drivers/EXTSR.py
 mv drivers/EXTSR.py drivers/XFSSR.py
 git restore -SW drivers/EXTSR.py

Signed-off-by: Yann Dirson <yann.dirson@vates.fr>
Some important points:

- linstor.KV must use an identifier name that starts with a letter (so it uses a "sr-" prefix).

- Encrypted VDI are supported with key_hash attribute (not tested, experimental).

- When a new LINSTOR volume is created on a host (via snapshot or create), the remaining diskless
devices are not necessarily created on other hosts. So if a resource definition exists without
local device path, we ask it to LINSTOR. Wait 5s for symlink creation when a new volume
is created => 5s is is purely arbitrary, but this guarantees that we do not try to access the
volume if the symlink has not yet been created by the udev rule.

- Can change the provisioning using the device config 'provisioning' param.

- We can only increase volume size (See: LINBIT/linstor-server#66),
it would be great if we could shrink volumes to limit the space used by the snapshots.

- Inflate/Deflate can only be executed on the master host, a linstor-manager plugin is present
to do this from slaves. The same plugin is used to open LINSTOR ports + start controller.

- Use a `total_allocated_volume_size` method to have a good idea of the reserved memory
Why? Because `physical_free_size` is computed using the LVM used size, in the case of thick provisioning it's ok,
but when thin provisioning is choosen LVM returns only the allocated size using the used block count. So this method
solves this problem, it takes the fixed virtual volume size of each node to compute the required size to store the
volume data.

- Call vhd-util on remote hosts using the linstor-manager when necessary, i.e. vhd-util is called to get vhd info,
the DRBD device can be in use (and unusable by external processes), so we must use the local LVM device that
contains the DRBD data or a remote disk if the DRBD device is diskless.

- If a DRBD device is in use when vhdutil.getVHDInfo is called, we must have no
errors. So a LinstorVhdUtil wrapper is now used to bypass DRBD layer when
VDIs are loaded.

- Refresh PhyLink when unpause in called on DRBD devices:
We must always recreate the symlink to ensure we have
the right info. Why? Because if the volume UUID is changed in
LINSTOR the symlink is not directly updated. When live leaf
coalesce is executed we have these steps:
"A" -> "OLD_A"
"B" -> "A"
Without symlink update the previous "A" path is reused instead of
"B" path. Note: "A", "B" and "OLD_A" are UUIDs.

- Since linstor python modules are not present on every XCP-ng host,
module imports are protected by try.. except... blocks.

- Provide a linstor-monitor daemon to check master changes
- Check if "create" doesn't succeed without zfs packages
- Check if "scan" failed if the path is not mounted (not a ZFS mountpoint)
Co-authored-by: Piotr Robert Konopelko <piotr.konopelko@moosefs.pro>
Signed-off-by: Aleksander Wieliczko <aleksander.wieliczko@moosefs.pro>
Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
`umount` should not be called when `legacy_mode` is enabled, otherwise a mounted dir
used during SR creation is unmounted at the end of the `create` call (and also
when a PBD is unplugged) in `detach` block.

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
A sm-config boolean param `subdir` is available to configure where to store the VHDs:
- In a subdir with the SR UUID, the new behavior
- In the root directory of the MooseFS SR

By default, new SRs are created with `subdir` = True.
Existing SRs  are not modified and continue to use the folder that was given at
SR creation, directly, without looking for a subdirectory.

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
Ensure all shared drivers are imported in `_is_open` definition to register
them in the driver list. Otherwise this function always fails with a SRUnknownType exception.

Also, we must add two fake mandatory parameters to make MooseFS happy: `masterhost` and `rootpath`.
Same for CephFS with: `serverpath`. (NFS driver is directly patched to ensure there is no usage of
the `serverpath` param because its value is equal to None.)

`location` param is required to use ZFS, to be more precise, in the parent class: `FileSR`.

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
SR_CACHING offers the capacity to use IntelliCache, but this
feature is only available using NFS SR.

For more details, the implementation of `_setup_cache` in blktap2.py
uses only an instance of NFSFileVDI for the shared target.

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
* `except` syntax fixes
* drop `has_key()` usage
* drop `filter()` usage (but drop their silly `list(x.keys())` wrappings)
* drop `map()` usage
* use `int` not `long`
* use `items()` not `iteritems()`

Signed-off-by: Yann Dirson <yann.dirson@vates.fr>
…store

Signed-off-by: Yann Dirson <yann.dirson@vates.fr>
Guided by futurize's "old_div" use

Signed-off-by: Yann Dirson <yann.dirson@vates.fr>
PROBE_MOUNTPOINT in a some drivers is a relative path, which is resolved
using MOUNT_BASE at probe time, but CephFS, GlusterFS and MooseFS it is
set on driver load to an absolute path, and this requires MOUNT_BASE to be
looking like a path component.

```
drivers/CephFSSR.py:69: in <module>
    PROBE_MOUNTPOINT = os.path.join(SR.MOUNT_BASE, "probe")
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

a = <MagicMock name='mock.MOUNT_BASE' id='140396863897728'>, p = ('probe',)

    def join(a, *p):
        """Join two or more pathname components, inserting '/' as needed.
        If any component is an absolute path, all previous path components
        will be discarded.  An empty last part will result in a path that
        ends with a separator."""
>       a = os.fspath(a)
E       TypeError: expected str, bytes or os.PathLike object, not MagicMock

/usr/lib64/python3.6/posixpath.py:80: TypeError
```

Note this same idiom is also used in upstream SMBFS, although that does not
appear to cause any problem with the tests.

Signed-off-by: Yann Dirson <yann.dirson@vates.fr>
(coverage 7.2.5)

Without these changes many warns/errors are emitted:
- "assertEquals" is deprecated, "assertEqual" must be used instead
- mocked objects in "setUp" method like "cleanup.IPCFlag" cannot be repatched
  at the level of the test functions, otherwise tests are aborted,
  this is the  behavior of coverage version 7.2.5

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
The probe method is not implemented so we
shouldn't advertise it.

Signed-off-by: BenjiReis <benjamin.reis@vates.fr>
Impacted drivers: LINSTOR, MooseFS and ZFS.
- Ignore all linstor.* members during coverage,
  the module is not installed in github runner.
- Use mock from unittest, the old one is not found now.
- Remove useless return from LinstorSR scan method.

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
Nambrok and others added 22 commits December 2, 2025 00:23
This bug is minor but it makes it difficult to understand why snapshots
fail since the initial trace is lost due to the exception caused by the
reference to the non-existing variable "e".

Signed-off-by: Damien Thenot <damien.thenot@vates.tech>
The GC can be interrupted by a SIGTERM signal. If this is caught while modifying
a volume's hidden flag, this can have bad consequences.

For example in the situation below, the hidden flag of a volume has been changed
but the cached value (self.hidden) in the python process still has the old value
because of the 'util.CommandException' exception that was thrown. A VDI
that normally should not be hidden is still hidden after executing
`_undoInterruptedCoalesceLeaf` because the hidden value was not the correct one.

Code:
```
    def _setHidden(self, hidden=True):
        vhdutil.setHidden(self.path, hidden)
        # Exception! Next line is never executed.
        self.hidden = hidden
```

Trace:
```
Jun  5 09:15:50 r620-q6 SMGC: [563219] Removed vhd-parent from dce4b0fc(2.000G/170.336M?)
Jun  5 09:15:50 r620-q6 SMGC: [563219] Removed vhd-blocks from dce4b0fc(2.000G/170.336M?)
Jun  5 09:15:50 r620-q6 SM: [563219] ['/usr/bin/vhd-util', 'set', '--debug', '-n', '/var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/OLD_dce4b0fc-6ad1-4750-857b-45d8d2758503.vhd', '-f', 'hidden', '-v', '1']
Jun  5 09:15:50 r620-q6 SM: [563219] GC: recieved SIGTERM
Jun  5 09:15:50 r620-q6 SM: [563219] FAILED in util.pread: (rc -15) stdout: '', stderr: ''
Jun  5 09:15:50 r620-q6 SMGC: [563219] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
Jun  5 09:15:50 r620-q6 SMGC: [563219]          ***********************
Jun  5 09:15:50 r620-q6 SMGC: [563219]          *  E X C E P T I O N  *
Jun  5 09:15:50 r620-q6 SMGC: [563219]          ***********************
Jun  5 09:15:50 r620-q6 SMGC: [563219] _doCoalesceLeaf: EXCEPTION <class 'util.CommandException'>, Signalled 15
Jun  5 09:15:50 r620-q6 SMGC: [563219]   File "/opt/xensource/sm/cleanup.py", line 2653, in _liveLeafCoalesce
Jun  5 09:15:50 r620-q6 SMGC: [563219]     self._doCoalesceLeaf(vdi)
Jun  5 09:15:50 r620-q6 SMGC: [563219]   File "/opt/xensource/sm/cleanup.py", line 2717, in _doCoalesceLeaf
Jun  5 09:15:50 r620-q6 SMGC: [563219]     vdi._setHidden(True)
Jun  5 09:15:50 r620-q6 SMGC: [563219]   File "/opt/xensource/sm/cleanup.py", line 1063, in _setHidden
Jun  5 09:15:50 r620-q6 SMGC: [563219]     vhdutil.setHidden(self.path, hidden)
Jun  5 09:15:50 r620-q6 SMGC: [563219]   File "/opt/xensource/sm/vhdutil.py", line 235, in setHidden
Jun  5 09:15:50 r620-q6 SMGC: [563219]     ret = ioretry(cmd)
Jun  5 09:15:50 r620-q6 SMGC: [563219]   File "/opt/xensource/sm/vhdutil.py", line 94, in ioretry
Jun  5 09:15:50 r620-q6 SMGC: [563219]     errlist=[errno.EIO, errno.EAGAIN])
Jun  5 09:15:50 r620-q6 SMGC: [563219]   File "/opt/xensource/sm/util.py", line 347, in ioretry
Jun  5 09:15:50 r620-q6 SMGC: [563219]     return f()
Jun  5 09:15:50 r620-q6 SMGC: [563219]   File "/opt/xensource/sm/vhdutil.py", line 93, in <lambda>
Jun  5 09:15:50 r620-q6 SMGC: [563219]     return util.ioretry(lambda: util.pread2(cmd, text=text),
Jun  5 09:15:50 r620-q6 SMGC: [563219]   File "/opt/xensource/sm/util.py", line 255, in pread2
Jun  5 09:15:50 r620-q6 SMGC: [563219]     return pread(cmdlist, quiet=quiet, text=text)
Jun  5 09:15:50 r620-q6 SMGC: [563219]   File "/opt/xensource/sm/util.py", line 217, in pread
Jun  5 09:15:50 r620-q6 SMGC: [563219]     raise CommandException(rc, str(cmdlist), stderr.strip())
Jun  5 09:15:50 r620-q6 SMGC: [563219]
Jun  5 09:15:50 r620-q6 SMGC: [563219] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
Jun  5 09:15:50 r620-q6 SMGC: [563219] *** UNDO LEAF-COALESCE
Jun  5 09:15:50 r620-q6 SMGC: [563219] Renaming parent back: dce4b0fc-6ad1-4750-857b-45d8d2758503 -> 056b6f93-66ff-460a-9354-157540b584a8
Jun  5 09:15:50 r620-q6 SMGC: [563219] Renaming /var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/dce4b0fc-6ad1-4750-857b-45d8d2758503.vhd -> /var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/056b6f93-66ff-460a-9354-157540b584a8.vhd
Jun  5 09:15:50 r620-q6 SMGC: [563219] Renaming child back to dce4b0fc-6ad1-4750-857b-45d8d2758503
Jun  5 09:15:50 r620-q6 SMGC: [563219] Renaming /var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/OLD_dce4b0fc-6ad1-4750-857b-45d8d2758503.vhd -> /var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/dce4b0fc-6ad1-4750-857b-45d8d2758503.vhd
Jun  5 09:15:50 r620-q6 SMGC: [563219] Updating the VDI record
Jun  5 09:15:50 r620-q6 SMGC: [563219] Set vhd-parent = 056b6f93-66ff-460a-9354-157540b584a8 for dce4b0fc(2.000G/8.500K?)
Jun  5 09:15:50 r620-q6 SMGC: [563219] Set vdi_type = vhd for dce4b0fc(2.000G/8.500K?)
Jun  5 09:15:50 r620-q6 SM: [563219] ['/usr/bin/vhd-util', 'set', '--debug', '-n', '/var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/056b6f93-66ff-460a-9354-157540b584a8.vhd', '-f', 'hidden', '-v', '1']
Jun  5 09:15:50 r620-q6 SM: [563219]   pread SUCCESS
Jun  5 09:15:50 r620-q6 SMGC: [563219] *** leaf-coalesce undo successful
```

Therefore, a VDI impacted by this problem remains hidden and can no longer
be used correctly without manual intervention:
```
Jun  5 09:16:29 r620-q6 SM: [566174] lock: released /var/lock/sm/f816795d-e7a9-43df-170c-23bc329607fc/sr
Jun  5 09:16:29 r620-q6 SM: [566174] ***** generic exception: vdi_clone: EXCEPTION <class 'xs_errors.SROSError'>, Failed to clone VDI [opterr=hidden VDI]
Jun  5 09:16:29 r620-q6 SM: [566174]   File "/opt/xensource/sm/SRCommand.py", line 113, in run
Jun  5 09:16:29 r620-q6 SM: [566174]     return self._run_locked(sr)
Jun  5 09:16:29 r620-q6 SM: [566174]   File "/opt/xensource/sm/SRCommand.py", line 163, in _run_locked
Jun  5 09:16:29 r620-q6 SM: [566174]     rv = self._run(sr, target)
Jun  5 09:16:29 r620-q6 SM: [566174]   File "/opt/xensource/sm/SRCommand.py", line 270, in _run
Jun  5 09:16:29 r620-q6 SM: [566174]     return target.clone(self.params['sr_uuid'], self.vdi_uuid)
Jun  5 09:16:29 r620-q6 SM: [566174]   File "/opt/xensource/sm/FileSR.py", line 704, in clone
Jun  5 09:16:29 r620-q6 SM: [566174]     return self._do_snapshot(sr_uuid, vdi_uuid, VDI.SNAPSHOT_DOUBLE)
Jun  5 09:16:29 r620-q6 SM: [566174]   File "/opt/xensource/sm/FileSR.py", line 754, in _do_snapshot
Jun  5 09:16:29 r620-q6 SM: [566174]     return self._snapshot(snapType, cbtlog, consistency_state)
Jun  5 09:16:29 r620-q6 SM: [566174]   File "/opt/xensource/sm/FileSR.py", line 797, in _snapshot
Jun  5 09:16:29 r620-q6 SM: [566174]     raise xs_errors.XenError('VDIClone', opterr='hidden VDI')
Jun  5 09:16:29 r620-q6 SM: [566174]
```

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>
In the event of a network outage on a LINSTOR host where the
controller is running, a rather problematic situation can occur:
the `/var/lib/linstor` folder may remain mounted (in RO mode) while
`xcp-persistent-database` has become PRIMARY on another machine.

This situation occurs following a kernel freeze lasting several minutes
of jbd2/ext4fs.

Trace of the temporary blockage:
```
Jul  8 15:05:39 xcp-ng-ha-1 kernel: [98867.434915] r8125: eth2: link down
Jul  8 15:06:03 xcp-ng-ha-1 kernel: [98890.897922] r8125: eth2: link up
Jul  8 15:09:13 xcp-ng-ha-1 kernel: [99081.001306] INFO: task jbd2/drbd1000-8:736989 blocked for more than 120 seconds.
Jul  8 15:09:13 xcp-ng-ha-1 kernel: [99081.001314]       Tainted: G           O      4.19.0+1 #1
Jul  8 15:09:13 xcp-ng-ha-1 kernel: [99081.001316] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul  8 15:09:13 xcp-ng-ha-1 kernel: [99081.001319] jbd2/drbd1000-8 D    0 736989      2 0x80000000
Jul  8 15:09:13 xcp-ng-ha-1 kernel: [99081.001321] Call Trace:
Jul  8 15:09:13 xcp-ng-ha-1 kernel: [99081.001330]  ? __schedule+0x2a6/0x880
Jul  8 15:09:13 xcp-ng-ha-1 kernel: [99081.001331]  schedule+0x32/0x80
Jul  8 15:09:13 xcp-ng-ha-1 kernel: [99081.001334]  jbd2_journal_commit_transaction+0x260/0x1896
Jul  8 15:09:13 xcp-ng-ha-1 kernel: [99081.001336]  ? __switch_to_asm+0x34/0x70
Jul  8 15:09:13 xcp-ng-ha-1 kernel: [99081.001337]  ? __switch_to_asm+0x40/0x70
Jul  8 15:09:13 xcp-ng-ha-1 kernel: [99081.001338]  ? __switch_to_asm+0x34/0x70
Jul  8 15:09:13 xcp-ng-ha-1 kernel: [99081.001339]  ? __switch_to_asm+0x40/0x70
Jul  8 15:09:13 xcp-ng-ha-1 kernel: [99081.001340]  ? __switch_to_asm+0x34/0x70
Jul  8 15:09:13 xcp-ng-ha-1 kernel: [99081.001341]  ? __switch_to_asm+0x40/0x70
Jul  8 15:09:13 xcp-ng-ha-1 kernel: [99081.001342]  ? __switch_to_asm+0x34/0x70
Jul  8 15:09:13 xcp-ng-ha-1 kernel: [99081.001343]  ? __switch_to_asm+0x40/0x70
Jul  8 15:09:13 xcp-ng-ha-1 kernel: [99081.001346]  ? wait_woken+0x80/0x80
Jul  8 15:09:13 xcp-ng-ha-1 kernel: [99081.001348]  ? try_to_del_timer_sync+0x4d/0x80
Jul  8 15:09:13 xcp-ng-ha-1 kernel: [99081.001350]  kjournald2+0xc1/0x260
Jul  8 15:09:13 xcp-ng-ha-1 kernel: [99081.001351]  ? wait_woken+0x80/0x80
Jul  8 15:09:13 xcp-ng-ha-1 kernel: [99081.001353]  kthread+0xf8/0x130
Jul  8 15:09:13 xcp-ng-ha-1 kernel: [99081.001355]  ? commit_timeout+0x10/0x10
Jul  8 15:09:13 xcp-ng-ha-1 kernel: [99081.001356]  ? kthread_bind+0x10/0x10
Jul  8 15:09:13 xcp-ng-ha-1 kernel: [99081.001357]  ret_from_fork+0x22/0x40
Jul  8 15:11:14 xcp-ng-ha-1 kernel: [99201.830064] INFO: task jbd2/drbd1000-8:736989 blocked for more than 120 seconds.
Jul  8 15:11:14 xcp-ng-ha-1 kernel: [99201.830071]       Tainted: G           O      4.19.0+1 #1
Jul  8 15:11:14 xcp-ng-ha-1 kernel: [99201.830074] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul  8 15:11:14 xcp-ng-ha-1 kernel: [99201.830076] jbd2/drbd1000-8 D    0 736989      2 0x80000000
Jul  8 15:11:14 xcp-ng-ha-1 kernel: [99201.830078] Call Trace:
Jul  8 15:11:14 xcp-ng-ha-1 kernel: [99201.830086]  ? __schedule+0x2a6/0x880
Jul  8 15:11:14 xcp-ng-ha-1 kernel: [99201.830088]  schedule+0x32/0x80
Jul  8 15:11:14 xcp-ng-ha-1 kernel: [99201.830091]  jbd2_journal_commit_transaction+0x260/0x1896
Jul  8 15:11:14 xcp-ng-ha-1 kernel: [99201.830093]  ? __switch_to_asm+0x34/0x70
Jul  8 15:11:14 xcp-ng-ha-1 kernel: [99201.830094]  ? __switch_to_asm+0x40/0x70
Jul  8 15:11:14 xcp-ng-ha-1 kernel: [99201.830095]  ? __switch_to_asm+0x34/0x70
Jul  8 15:11:14 xcp-ng-ha-1 kernel: [99201.830096]  ? __switch_to_asm+0x40/0x70
Jul  8 15:11:14 xcp-ng-ha-1 kernel: [99201.830097]  ? __switch_to_asm+0x34/0x70
Jul  8 15:11:14 xcp-ng-ha-1 kernel: [99201.830098]  ? __switch_to_asm+0x40/0x70
Jul  8 15:11:14 xcp-ng-ha-1 kernel: [99201.830099]  ? __switch_to_asm+0x34/0x70
Jul  8 15:11:14 xcp-ng-ha-1 kernel: [99201.830100]  ? __switch_to_asm+0x40/0x70
Jul  8 15:11:14 xcp-ng-ha-1 kernel: [99201.830103]  ? wait_woken+0x80/0x80
Jul  8 15:11:14 xcp-ng-ha-1 kernel: [99201.830105]  ? try_to_del_timer_sync+0x4d/0x80
Jul  8 15:11:14 xcp-ng-ha-1 kernel: [99201.830107]  kjournald2+0xc1/0x260
Jul  8 15:11:14 xcp-ng-ha-1 kernel: [99201.830108]  ? wait_woken+0x80/0x80
Jul  8 15:11:14 xcp-ng-ha-1 kernel: [99201.830110]  kthread+0xf8/0x130
Jul  8 15:11:14 xcp-ng-ha-1 kernel: [99201.830112]  ? commit_timeout+0x10/0x10
Jul  8 15:11:14 xcp-ng-ha-1 kernel: [99201.830113]  ? kthread_bind+0x10/0x10
Jul  8 15:11:14 xcp-ng-ha-1 kernel: [99201.830114]  ret_from_fork+0x22/0x40
Jul  8 15:11:51 xcp-ng-ha-1 kernel: [99238.731530] drbd_reject_write_early: 2 callbacks suppressed
Jul  8 15:11:51 xcp-ng-ha-1 kernel: [99238.731541] Aborting journal on device drbd1000-8.
Jul  8 15:11:51 xcp-ng-ha-1 kernel: [99238.731544] Buffer I/O error on dev drbd1000, logical block 131072, lost sync page write
Jul  8 15:11:51 xcp-ng-ha-1 kernel: [99238.731546] JBD2: Error -5 detected when updating journal superblock for drbd1000-8.
Jul  8 15:11:51 xcp-ng-ha-1 kernel: [99238.731549] EXT4-fs error (device drbd1000) in ext4_reserve_inode_write:5872: Journal has aborted
Jul  8 15:11:51 xcp-ng-ha-1 kernel: [99238.731556] Buffer I/O error on dev drbd1000, logical block 0, lost sync page write
Jul  8 15:11:51 xcp-ng-ha-1 kernel: [99238.731562] EXT4-fs (drbd1000): I/O error while writing superblock
Jul  8 15:11:51 xcp-ng-ha-1 kernel: [99238.731565] EXT4-fs error (device drbd1000) in ext4_orphan_add:2822: Journal has aborted
Jul  8 15:11:51 xcp-ng-ha-1 kernel: [99238.731569] Buffer I/O error on dev drbd1000, logical block 0, lost sync page write
Jul  8 15:11:51 xcp-ng-ha-1 kernel: [99238.731571] EXT4-fs (drbd1000): I/O error while writing superblock
Jul  8 15:11:51 xcp-ng-ha-1 kernel: [99238.731575] EXT4-fs error (device drbd1000) in ext4_reserve_inode_write:5872: Journal has aborted
Jul  8 15:11:51 xcp-ng-ha-1 kernel: [99238.731578] Buffer I/O error on dev drbd1000, logical block 0, lost sync page write
Jul  8 15:11:51 xcp-ng-ha-1 kernel: [99238.731581] EXT4-fs (drbd1000): I/O error while writing superblock
Jul  8 15:11:51 xcp-ng-ha-1 kernel: [99238.731586] EXT4-fs error (device drbd1000) in ext4_truncate:4527: Journal has aborted
Jul  8 15:11:51 xcp-ng-ha-1 kernel: [99238.731589] Buffer I/O error on dev drbd1000, logical block 0, lost sync page write
Jul  8 15:11:51 xcp-ng-ha-1 kernel: [99238.731592] EXT4-fs (drbd1000): I/O error while writing superblock
```

On the drbd-monitor side, here's what happens: we failed
to stop the controller, and it was subsequently killed by systemd.
Then an attempt to unmount `/var/lib/linstor` failed completely:
```
Jul  8 15:10:15 xcp-ng-ha-1 systemd[1]: linstor-controller.service stop-final-sigterm timed out. Killing.
Jul  8 15:11:45 xcp-ng-ha-1 systemd[1]: linstor-controller.service still around after final SIGKILL. Entering failed mode.
Jul  8 15:11:45 xcp-ng-ha-1 systemd[1]: Stopped drbd-reactor controlled linstor-controller.
Jul  8 15:11:45 xcp-ng-ha-1 systemd[1]: Unit linstor-controller.service entered failed state.
Jul  8 15:11:45 xcp-ng-ha-1 systemd[1]: linstor-controller.service failed.
Jul  8 15:11:45 xcp-ng-ha-1 systemd[1]: Stopping drbd-reactor controlled var-lib-linstor...
Jul  8 15:11:48 xcp-ng-ha-1 Satellite[739516]: 2025-07-08 15:11:48.312 [MainWorkerPool-8] INFO  LINSTOR/Satellite/000010 SYSTEM - SpaceInfo: DfltDisklessStorPool -> 9223372036854775807/9223372036854775807
Jul  8 15:11:48 xcp-ng-ha-1 Satellite[739516]: 2025-07-08 15:11:48.447 [MainWorkerPool-8] INFO  LINSTOR/Satellite/000010 SYSTEM - SpaceInfo: xcp-sr-linstor_group_thin_device -> 430950298/444645376
Jul  8 15:11:51 xcp-ng-ha-1 systemd[1]: var-lib-linstor.service: control process exited, code=exited status=32
Jul  8 15:11:51 xcp-ng-ha-1 systemd[1]: Stopped drbd-reactor controlled var-lib-linstor.
Jul  8 15:11:51 xcp-ng-ha-1 systemd[1]: Unit var-lib-linstor.service entered failed state.
Jul  8 15:11:51 xcp-ng-ha-1 systemd[1]: var-lib-linstor.service failed.
Jul  8 15:11:51 xcp-ng-ha-1 systemd[1]: Stopping Promotion of DRBD resource xcp-persistent-database...
```

In this situation: the host will not be able to run the controller
again without manually unmounting `/var/lib/linstor`. The solution
to this problem is to attempt a `umount` call with the lazy option.
This option can be dangerous in many situations, but here we don't
have much choice:
- The DRBD resource is technically no longer PRIMARY and therefore
  no longer accessible
- The controller has been stopped
- No writing is possible

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>
Try to use host_OpaqueRef to access primary then try on the master host
if it doesn't work, then find the primary with linstor API or if no
primary, any other host.

Signed-off-by: Damien Thenot <damien.thenot@vates.tech>
Co-authored-by: Ronan Abhamon <ronan.abhamon@vates.fr>
Signed-off-by: Damien Thenot <damien.thenot@vates.tech>
fix(linstor): prevent use of e before assignment in nested try-except
fix(linstor): use util.get_master_ref to get the master ref
fix(linstor): log host_ref instead UUID to prevent XAPI call
fix(log_failed_call): set error value for the call without an actual error
fix(linstorhostcall): use next iter instead of list conversion
cleanup(linstor): remove currently unused get_primary function

Signed-off-by: Mathieu Labourier <mathieu.labourier@vates.tech>
Co-authored-by: Damien Thenot <damien.thenot@vates.tech>
Co-authored-by: Ronan Abhamon <ronan.abhamon@vates.tech>
Upstream patch of ae10349 is incorrect.

All "@mock.patch('blktap2.VDI.PhyLink', autospec=True)" lines must be removed
because PhyLink is mocked globally.

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>
- Use specific DRBD options to detect failures in a small delay.
- Use these options to control quorum with drbd-reactor.
- Provide a better compromise in terms of availability.

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>
Impacted functions: `_get_volumes_info` and `_get_volume_node_names_and_size`.

Before this change "usable_size" validity was
checked too early and which could lead to an exception for
no good reason while the size could be known on at
least one host despite an issue on other machines.

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>
Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>
Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>
…ll context

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>
Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>
…mutators

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>
Session attr is not set during "attach/detach calls from config".
In this context local method must always be called.

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>
Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>
A change in lvm2 `https://github.com/xcp-ng-rpms/lvm2/pull/3/files`
introduces an issue in LargeBlockSR: `/dev/` is not scanned now meaning
the loop device is never used for VG activation. So we must add a custom
scan parameter to LVM commands.
We also now systematically do the call to _redo_vg_connection to use our
custom parameters to enable the LV on the correct device before calling
`EXTSR.attach()`.

Signed-off-by: Damien Thenot <damien.thenot@vates.tech>
This is not done on every and each implementation of SR but only on ones that calls cleanup.start_gc_service (like FileSR)
and on the classes that inherits from them and don't call super on detach.

This is to prevent useless errors logs like Failed to stop xxx.service: Unit xxx.service not loaded.

Signed-off-by: Mathieu Labourier <mathieu.labourier@vates.tech>
When the pool master is changed and if it doesn't have a local DB path
then `get_database_path` fails during SR.scan call.
This patch allows creating a diskless path if necessary.

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>
…104)

In `_request_device_path`:
Before this change, an exception was thrown when a resource was missing,
but not when the returned path was empty. Now it's raised in both cases.

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>
Add a way in `linstorvolumemanager` to verify that all nodes are using the same LINSTOR version at init.
Raise an error early if this happens so that SR ops are properly disabled with clear feedback to the user.

Signed-off-by: Antoine Bartuccio <antoine.bartuccio@vates.tech>
Avoid python version mismatch that pulls incompatible dependencies in github actions when running unittests.

Signed-off-by: Antoine Bartuccio <antoine.bartuccio@vates.tech>
- Simplify scan logic removing XAPI calls.
- Create one XAPI session / LINSTOR connection in each worker thread.

Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>
Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>
Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.tech>
@Wescoeur Wescoeur force-pushed the ran-xostor-fast-scan branch from 7fa711a to a1eaa0e Compare December 1, 2025 23:39
Comment on lines +1227 to +1231
all_executor_load.append(load)

session = XenAPI.xapi_local()
session.xenapi.login_with_password('root', '', '', 'SM')
load._session = session

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure to see why you mutate load after adding it to all_executor_load. IMHO it would read better if you did:

            load._session = session
            all_executor_load.append(load)

Comment on lines 3544 to 3570
def init_executor_thread():
class Load(object):
def __init__(self):
self._session = None

def cleanup(self):
if self._session:
self._session.xenapi.session.logout()

load = Load()

all_executor_load.append(load)

session = XenAPI.xapi_local()
session.xenapi.login_with_password('root', '', '', 'SM')
load._session = session
executor_data.session = session

linstor = LinstorVolumeManager(
self._linstor.uri,
self._linstor.group_name,
repair=False,
logger=Util.log
)
executor_data.linstor = linstor

executor_data.vhdutil = LinstorVhdUtil(session, linstor)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this code seems to be repeated. Would it be possible to make it a util?

Both the executor_data and all_executor_load could be passed as an argument, maybe along with an additional function to do some specific code (like the Linstor part)

Comment on lines +3555 to +3559
all_executor_load.append(load)

session = XenAPI.xapi_local()
session.xenapi.login_with_password('root', '', '', 'SM')
load._session = session

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as below, I'd prefer to not mutate load after appending it.

Comment on lines +3580 to +3590
try:
with ThreadPoolExecutor() as executor:
for info in executor.map(load_info, pending_vdi_uuids):
all_vdi_info[info.uuid] = info
finally:
for load in all_executor_load:
try:
load.cleanup()
except Exception as e:
Util.log(f"Failed to clean load executor: {e}")
all_executor_load.clear()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also repeated twice. I think it could be make generic at some point.

# Make sure this call never stucks because this function can be called
# during HA init and in this case we can wait forever.
session = util.timeout_call(10, util.get_localAPI_session)
session = util.get_localAPI_session()
Copy link

@Millefeuille42 Millefeuille42 Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain in comments or at least in the PR why you drop the timeout?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants