forked from xapi-project/sm
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase DRBD Net/ping-timeout #45
Open
benjamreis
wants to merge
133
commits into
2.30.8-8.2-linstor-fixes-staging
Choose a base branch
from
2.30.8-8.2-linstor-increase-ping-timeout
base: 2.30.8-8.2-linstor-fixes-staging
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Increase DRBD Net/ping-timeout #45
benjamreis
wants to merge
133
commits into
2.30.8-8.2-linstor-fixes-staging
from
2.30.8-8.2-linstor-increase-ping-timeout
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…probe calls Signed-off-by: Mark Syms <mark.syms@citrix.com> Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
This was a patch added to the sm RPM git repo before we had this forked git repo for sm in the xcp-ng github organisation.
This was a patch added to the sm RPM git repo before we had this forked git repo for sm in the xcp-ng github organisation.
The driver is needed to transition to the ext driver. Users who upgrade from XCP-ng <= 8.0 need a working driver so that they can move the VMs out of the ext4 SR and delete the SR. Not keeping that driver would force such users to upgrade to 8.1 first, convert their SR, then upgrade to a higher version. However, like in XCP-ng 8.1, the driver will refuse any new ext4 SR creation.
Some important points: - linstor.KV must use an identifier name that starts with a letter (so it uses a "sr-" prefix). - Encrypted VDI are supported with key_hash attribute (not tested, experimental). - When a new LINSTOR volume is created on a host (via snapshot or create), the remaining diskless devices are not necessarily created on other hosts. So if a resource definition exists without local device path, we ask it to LINSTOR. Wait 5s for symlink creation when a new volume is created => 5s is is purely arbitrary, but this guarantees that we do not try to access the volume if the symlink has not yet been created by the udev rule. - Can change the provisioning using the device config 'provisioning' param. - We can only increase volume size (See: LINBIT/linstor-server#66), it would be great if we could shrink volumes to limit the space used by the snapshots. - Inflate/Deflate can only be executed on the master host, a linstor-manager plugin is present to do this from slaves. The same plugin is used to open LINSTOR ports + start controller. - Use a `total_allocated_volume_size` method to have a good idea of the reserved memory Why? Because `physical_free_size` is computed using the LVM used size, in the case of thick provisioning it's ok, but when thin provisioning is choosen LVM returns only the allocated size using the used block count. So this method solves this problem, it takes the fixed virtual volume size of each node to compute the required size to store the volume data. - Call vhd-util on remote hosts using the linstor-manager when necessary, i.e. vhd-util is called to get vhd info, the DRBD device can be in use (and unusable by external processes), so we must use the local LVM device that contains the DRBD data or a remote disk if the DRBD device is diskless. - If a DRBD device is in use when vhdutil.getVHDInfo is called, we must have no errors. So a LinstorVhdUtil wrapper is now used to bypass DRBD layer when VDIs are loaded. - Refresh PhyLink when unpause in called on DRBD devices: We must always recreate the symlink to ensure we have the right info. Why? Because if the volume UUID is changed in LINSTOR the symlink is not directly updated. When live leaf coalesce is executed we have these steps: "A" -> "OLD_A" "B" -> "A" Without symlink update the previous "A" path is reused instead of "B" path. Note: "A", "B" and "OLD_A" are UUIDs. - Since linstor python modules are not present on every XCP-ng host, module imports are protected by try.. except... blocks. - Provide a linstor-monitor daemon to check master changes
- Check if "create" doesn't succeed without zfs packages - Check if "scan" failed if the path is not mounted (not a ZFS mountpoint)
Some QNAP devices do not provide ACL when fetching NFS mounts. In this case the assumed ACL should be: "*". This commit fixes the crash when attempting to access the non existing ACL. Relevant issues: - xapi-project#511 - xcp-ng/xcp#113
Co-authored-by: Piotr Robert Konopelko <piotr.konopelko@moosefs.pro> Signed-off-by: Aleksander Wieliczko <aleksander.wieliczko@moosefs.pro> Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
`umount` should not be called when `legacy_mode` is enabled, otherwise a mounted dir used during SR creation is unmounted at the end of the `create` call (and also when a PBD is unplugged) in `detach` block. Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
A sm-config boolean param `subdir` is available to configure where to store the VHDs: - In a subdir with the SR UUID, the new behavior - In the root directory of the MooseFS SR By default, new SRs are created with `subdir` = True. Existing SRs are not modified and continue to use the folder that was given at SR creation, directly, without looking for a subdirectory. Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
Ensure all shared drivers are imported in `_is_open` definition to register them in the driver list. Otherwise this function always fails with a SRUnknownType exception. Also, we must add two fake mandatory parameters to make MooseFS happy: `masterhost` and `rootpath`. Same for CephFS with: `serverpath`. (NFS driver is directly patched to ensure there is no usage of the `serverpath` param because its value is equal to None.) `location` param is required to use ZFS, to be more precise, in the parent class: `FileSR`. Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
SR_CACHING offers the capacity to use IntelliCache, but this feature is only available using NFS SR. For more details, the implementation of `_setup_cache` in blktap2.py uses only an instance of NFSFileVDI for the shared target. Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
When static vdis are used there is no snapshots and we don't want to call method from XAPI. Signed-off-by: Guillaume <guillaume.thouvenin@vates.tech>
This file is meant to remain unchanged and regularly updated along with the SM component. Users can create a custom configuration file in /etc/multipath/conf.d/ instead. Signed-off-by: Samuel Verschelde <stormi-xcp@ylix.fr> (cherry picked from commit b44d3f5)
Meant to be installed as /etc/multipath/conf.d/custom.conf for users to have an easy entry point for editing, as well as information on what will happen to this file through future system updates and upgrades. Signed-off-by: Samuel Verschelde <stormi-xcp@ylix.fr> (cherry picked from commit 18b79a5)
Update Makefile so that the file is installed along with sm. Signed-off-by: Samuel Verschelde <stormi-xcp@ylix.fr>
Otherwise the SIGALRM signal can be emitted after the execution of the given user function. Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
Details: - vdi_attach and vdi_detach are now exclusive - lock volumes on slaves (when vdi_xxx command is used) and avoid release if a timeout is reached - load all VDIs only when necessary, so only if it exists at least a journal entry or if sr_scan/sr_attach is executed - use a __slots__ attr in LinstorVolumeManager to increase performance - use a cache directly in LinstorVolumeManager to reduce network request count with LINSTOR - try to always use the same LINSTOR KV object to limit netwok usage - use a cache to avoid a new JSON parsing when all VDIs are loaded in LinstorSR - limit request count when LINSTOR storage pool info is fetched using a fetch interval - avoid race condition in cleanup: check if a volume is locked in a slave or not before modify it - ... Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
…alled outside module Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
…te allocated size stats Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
…_from_config is executed Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
Now, we can: - Start a controller on any node - Share the LINSTOR volume list using a specific volume "xcp-persistent-database" - Use the HA with "xcp-persistent-ha-statefile" and "xcp-persistent-redo-log" volumes - Create the nodes automatically during SR creation Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
…mes when master satellite is down Steps to reproduce: - Ensure the linstor satellite is not running on the master host, otherwise stop it - Then restart the controller on the right host where the LINSTOR database is mounted - Run st_attach command => All volumes will be forgotten To avoid this, it's possible to restart the satellite on the master before the sr_attach command. Also it's funny to see you can start and stop the satellite juste before the sr_attach, and the volumes will not be removed. Explanations: In theory this bug is impossible because during the sr_attach execution, an exception is thrown (so sr_scan should not be executed) BUT there is a piece of code that is executed in SRCommand.py when sr_attach is called: ```python try: return sr.attach(sr_uuid) finally: if is_master: sr.after_master_attach(sr_uuid) ``` The exception is not immediately forwarded because a finally block must be executed before. And what is the implementation of after_master_attach? ```python def after_master_attach(self, uuid): """Perform actions required after attaching on the pool master Return: None """ self.scan(uuid) ``` Oh! Of course, a scan is always executed after a attach... What's the purpose of a scan if we can't execute correctly an attach command before? I don't know, but it's probably error-prone like this context. When scan is called, we suppose the SR is attached and we have all VDIs loaded but it's not the case because an exception has been thrown. To solve this problem we forbid the execution of the scan if the attach failed. Signed-off-by: Ronan Abhamon <ronan.abhamon@vates.fr>
Wescoeur
force-pushed
the
2.30.8-8.2-linstor-fixes-staging
branch
from
December 20, 2023 14:01
44f2ee3
to
bf38210
Compare
Wescoeur
force-pushed
the
2.30.8-8.2-linstor-fixes-staging
branch
2 times, most recently
from
January 23, 2024 13:17
4222231
to
3499398
Compare
Wescoeur
force-pushed
the
2.30.8-8.2-linstor-fixes-staging
branch
3 times, most recently
from
February 12, 2024 19:56
150d510
to
f87c3eb
Compare
Wescoeur
force-pushed
the
2.30.8-8.2-linstor-fixes-staging
branch
6 times, most recently
from
April 29, 2024 15:22
7238799
to
f36a7a2
Compare
Nambrok
force-pushed
the
2.30.8-8.2-linstor-fixes-staging
branch
from
May 7, 2024 13:22
a217ee4
to
89f927e
Compare
Wescoeur
force-pushed
the
2.30.8-8.2-linstor-fixes-staging
branch
from
May 31, 2024 13:41
89f927e
to
2b01dd1
Compare
Wescoeur
force-pushed
the
2.30.8-8.2-linstor-fixes-staging
branch
3 times, most recently
from
June 13, 2024 11:25
76209bf
to
8249dcc
Compare
Wescoeur
force-pushed
the
2.30.8-8.2-linstor-fixes-staging
branch
2 times, most recently
from
June 28, 2024 13:09
f9e6a8e
to
e7ffbab
Compare
Wescoeur
force-pushed
the
2.30.8-8.2-linstor-fixes-staging
branch
7 times, most recently
from
July 26, 2024 12:49
51d4f89
to
3f63f6a
Compare
Wescoeur
force-pushed
the
2.30.8-8.2-linstor-fixes-staging
branch
2 times, most recently
from
August 6, 2024 15:17
028c295
to
31d150b
Compare
Wescoeur
force-pushed
the
2.30.8-8.2-linstor-fixes-staging
branch
from
September 24, 2024 08:29
04c2c93
to
0722952
Compare
Wescoeur
force-pushed
the
2.30.8-8.2-linstor-fixes-staging
branch
from
October 3, 2024 15:34
0722952
to
119dc63
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This would avoid fake dead node assumption