-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multipath-tools 0.11.0 #104
Draft
mwilck
wants to merge
61
commits into
opensvc:master
Choose a base branch
from
openSUSE:queue
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The @v1 releases are deprecated. https://github.blog/changelog/2024-02-13-deprecation-notice-v1-and-v2-of-the-artifact-actions/ Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
We can't use "container:" any more because upload-artifact@v4 doesn't work on Debian Jessie. Signed-off-by: Martin Wilck <mwilck@suse.com>
Running "make" in a container and "make clean" outside doesn't work (access right issues). So just use separate jobs. Signed-off-by: Martin Wilck <mwilck@suse.com>
Avoid "text file busy" error on GitHub. dmevents-test: Text file busy Makefile:74: recipe for target 'dmevents.out' failed Signed-off-by: Martin Wilck <mwilck@suse.com>
checker_check() is now a void function that stores the path state in the checker struct. It can be retrieved later using checker_get_state(). Right now, this is called immediately after checker_check(), but in the future, it can be deferred to another time. Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
This patch adds a new optional symbol for the dynamic path checker libraries, libcheck_pending. This is currently unused, but can be called on pending checkers to check if they've completed and return their value. The "tur" and "directio" checkers are the only ones which can return PATH_PENDING. They now implement libcheck_pending() as a wrapper around the code that libcheck_check uses to wait for pending paths, which has been broken out into its own function. Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
When the tur and directio checkers start an asynchronous checker, they now immediately return with the path in PATH_PENDING, instead of waiting in checker_check(). Instead the wait now happens in checker_get_state(). Additionally, the directio checker now waits for 1 ms, like the tur checker does. Also like the tur checker it now only waits once. If it is still pending after the first call to checker_get_state(). It will not wait at all on future calls, and will just process the already completed IOs. Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Make the directio tests work with libcheck_pending() being separate from libcheck_check Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
get_state() is now split into start_checker(), which runs the checker but doesn't wait for async checkers, and get_state(), which returns the state from the checker, after waiting for async checkers if necessary. Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Instead of pp->offline being a binary value, change it to show the actual result of looking at the sysfs state, and rename it to pp->sysfs_state. Also change the function name from path_offline() to path_sysfs_state(). Adapt the tests for pp->offline. This should not change how multipath currently works. pp->sysfs_state will be used in future patches. Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
check_path_state() is now split into start_path_check(), which calls path_sysfs_state() and if the path is up also calls start_checker(), and get_new_state() which gets the new state from either pp->sysfs_state or get_state(). Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Move the code that starts the path checker from do_check_path() into check_path(), rename the remainder of do_check_path() to update_path_state() and call that from check_path(). Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Reviewed-by: Martin Wilck <mwilck@suse.com>
Split out the code that updates a path's state and sets up the next check time into its own function, update_path(). Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Reviewed-by: Martin Wilck <mwilck@suse.com>
Split handle_uninitialized_path() into check_uninitialized_path, which handles udev retriggers for INIT_MISSING_UDEV paths and starts the path checker for INIT_FAILED and INIT_NEW paths, and update_uninitialized_path() which gets the path checker result and reruns pathinfo if the path is up. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Reviewed-by: Martin Wilck <mwilck@suse.com>
Instead of checking and updating each path, the checkerloop now starts the checkers on all the paths in check_paths(), and then goes back and updates all the paths in update_paths(). Since the async checkers use an absolute time to wait for before returning PATH_PENDING, only one checker actually needs to be waited for in update_paths(). The rest will already have reached their timeout when update_path() is called for them. The check_paths() and update_paths() loop over the pathvec instead of looping through the multipath device paths to avoid having to restart checking of the device paths when the multipath device needs to be resynced while updating the paths. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Reviewed-by: Martin Wilck <mwilck@suse.com>
Now that multipathd can drop the vecs lock and sleep in-between when the checker runs and when the path gets updated, it is possible for the user to either fail or reinstate the path in this window. If a path gets manually failed in the window between when the checker starts and the path is updated, the checker will have already been started, on the path, and if multipathd read the results like normal, it could simply reinstate the path. To avoid this, when a path checker is disabled, the checker path_state and message are updated, just like they are when checker_check() is run on a disabled path, so that when checker_get_state() is called, it will always return the same results for a disabled checker, regardless of when it was disabled. Reinstating the path doesn't cause many problems, but still can be improved. Since the checker was disabled when it would have been started, it didn't run during this cycle, only the kernel state will get updated. The rest of the path update changes won't happen until the next time the checker runs. This is the case regardless of whether or not the path was reinstated in the window between when the checker starts and the path is updated. To make reinstated paths get updated sooner, pp->tick is now set to 1 when the path is reinstated. Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Instead of updating the path priorities and possibly reloading the multipath device when each path is updated, wait till all paths have been updated, and then go through the multipath devices updating the priorities once, reloading if necessary. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Reviewed-by: Martin Wilck <mwilck@suse.com>
The code needs to check that pp->pgindex equals pp->mpp->bestpg and that they aren't both 0 (which is an invalid pathgroup id). Since we're already checking that they are equal, we only need to check one of them to see if it's non-zero. Instead of checking if pp->pgindex is non-zero for every path, just check mpp->bestpg once for the multipath device and return immediately if it's zero. Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
The way that multipathd was handling priority refreshes had some issues. Some of the multipath prioritizers can hang when run on a failed path. Multipathd was only skipping paths in PATH_DOWN, but there are other states where the prioritizer is also likely to hang, such as PATH_TIMEOUT, PATH_SHAKY, and to a lesser extent, PATH_DELAYED. Also, before the recent patch splitting the priority updating from the path checking, multipathd wasn't consistent with which states would cause paths to get their priorities updated. If a path changed its state to anything other than PATH_UP or PATH_GHOST, it wouldn't get its priority updated. But if a path kept the same state its priority would get updated as long at the state wasn't PATH_DOWN. For safety's sake, a path's priority should only get refreshed when its in the PATH_UP or PATH_GHOST state. This shouldn't cause problems. Only paths that are in the PATH_UP or PATH_GHOST state are usable by the kenel and contibute to the pathgroup's priority. Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Since we just returned if newstate wasn't PATH_UP or PATH_GHOST, it must obviously be PATH_UP or PATH_GHOST at this point in the code. We don't even wrap all the code for dealing with a path that just came up in this if-block, It's only the persistent resrvation code. Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
If reload_and_sync_map() removes the multipath device, deferred_failback_tick() needs to decrement the counter so that it doesn't skip the following device. Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Add a new optional checker class function, libcheck_need_wait() and a new function to call it, checker_need_wait(). This can be used to see if a path_checker is currently running. This will be used to determine if there are pending checkers that need to be waited for. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Reviewed-by: Martin Wilck <mwilck@suse.com>
Disable waiting in the libcheck_pending() checker class functions. Instead, they now just check if the checker has already completed. A future patch will re-add waiting outside of libcheck_pending(). Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Multipath again waits for running checkers to complete. In pathinfo(), mutipath will just wait 1ms if the checker is running. In checkerloop(), if there are running checkers, multipathd will drop the vecs lock and wait for 5ms. The difference it wait times is because multipathd cannot drop the vecs lock in pathinfo. Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Make the directio tests work with no waiting in the libcheck_pending() and the new libcheck_need_wait() call. Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Useless, automatically handled by the kernel since 4.3. (4.3 was released nine years ago) Cc: Martin Wilck <mwilck@suse.com> Cc: Benjamin Marzinski <bmarzins@redhat.com> Cc: Christophe Varoqui <christophe.varoqui@opensvc.com> Cc: DM-DEVEL ML <dm-devel@lists.linux.dev> Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com> Reviewed-by: Martin Wilck <mwilck@suse.com>
We import DM_COLDPLUG_SUSPENDED in all code flows below mpath_coldplug_end. Clarify this in the code. Signed-off-by: Martin Wilck <mwilck@suse.com> Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
Since b22c273 ("11-dm-mpath.rules: Don't force activation while device is suspended"), we've handled the case where a device is suspended while an uevent is processed (e.g. because multipathd is reloading the map again at the same time). But we were missing the case where The device had never been initialized before. If .MPATH_DEVICE_READY_OLD was empty, we'd jump to scan_import without setting MPATH_DEVICE_READY to 0. This can cause a device not to be fully activated at boot time, because in follow-up uevents we'd assume that the device had already been set up. Treat the case in which an uevent is processed for a previously not fully set-up, suspended device like other situations where we set MPATH_DEVICE_READY to 0. Fixes: b22c273 ("11-dm-mpath.rules: Don't force activation while device is suspended") Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
Old licences have been modified, because FSF postal address was changed: https://lists.gnu.org/archive/html/info-gnu/2024-09/msg00000.html https://www.fsf.org/blogs/community/fsf-office-closing-party https://www.gnu.org/licenses/old-licenses/gpl-2.0.txt https://www.gnu.org/licenses/old-licenses/lgpl-2.0.txt https://www.gnu.org/licenses/old-licenses/lgpl-2.1.txt Cc: Martin Wilck <mwilck@suse.com> Cc: Benjamin Marzinski <bmarzins@redhat.com> Cc: Christophe Varoqui <christophe.varoqui@opensvc.com> Cc: DM-DEVEL ML <dm-devel@lists.linux.dev> Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com> Reviewed-by: Martin Wilck <mwilck@suse.com>
After 7dc7b90 ("multipathd: use timestamps to tell when the directio checker timed out"), we need to actually sleep to simulate timeouts. Signed-off-by: Martin Wilck <mwilck@suse.com>
dm_get_maps() traverses the entire list of dm maps. We shouldn't give up just because probing a single map failed. Based on an earlier patch from Benjamin Marzinski <bmarzins@redhat.com> Fixes: bf3a4ad ("libmultipath: simplify dm_get_maps()") Signed-off-by: Martin Wilck <mwilck@suse.com> Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
multi-target maps are more like maps with a single non-multipath target than like maps with no target at all. Signed-off-by: Martin Wilck <mwilck@suse.com> Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
Before checking the target details, first check that the device has a "mpath-" dm uuid prefix. If it doesn't then we can just ignore the device. This keeps multipath from printing error messages for non-multipath devices with multiple targets for instance. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Reviewed-by: Martin Wilck <mwilck@suse.com>
Instead of seperately calling is_mpath_uuid(), just use MAPINFO_CHECK_UUID when calling libmp_mapinfo. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Reviewed-by: Martin Wilck <mwilck@suse.com>
Issuing multipathd commands when the daemon wasn't running didn't print anything. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Reviewed-by: Martin Wilck <mwilck@suse.com>
systemd will now restart multipathd on failure unless it has already been started 3 times in the last 30 seconds. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Reviewed-by: Martin Wilck <mwilck@suse.com>
The semantics of MAPINFO_CHECK_UUID, MAPINFO_MPATH_ONLY and MAPINFO_PART_ONLY are confusing. Fix that by supporting UUID check for partitions, too. Signed-off-by: Martin Wilck <mwilck@suse.com> Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
Don't try to remove any non-standard partition mappings. Signed-off-by: Martin Wilck <mwilck@suse.com> Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
Removing a partition map is an important action that should be logged at verbosity level 3. Signed-off-by: Martin Wilck <mwilck@suse.com> Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
Unless MAPINFO_CHECK_UUID was set and we know that we are looking at a map with a multipath UUID, encountering non-multipath maps is not an error. Don't log this at verbosity level 2. Signed-off-by: Martin Wilck <mwilck@suse.com> Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
Even at verbosity level 3, it isn't interesting and actually disturbing to see every successful state and pending state logged by multipathd. Suppress these messages at level 3. Signed-off-by: Martin Wilck <mwilck@suse.com> Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
If pp->dev_loss is DEV_LOSS_TMO_UNSET and min_dev_loss is 0 (which is the case if no_path_retry is NO_PATH_RETRY_FAIL or NO_PATH_RETRY_UNDEF), we will set pp->dev_loss to 0, which is wrong. Fix it. Fixes: 058b5f5 ("libmultipath: fix dev_loss_tmo even if not set in configuration") Signed-off-by: Martin Wilck <mwilck@suse.com> Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
- "prio_update" added to struct multipath - "oldstate" added to struct path - "path_state" added to struct checker - "pending" and "need_wait" added to struct checker_class - "path_offline" removed - function additions
Signed-off-by: Martin Wilck <mwilck@suse.com>
Reported by coverity: "i--" may cause an underflow, which will again cause an overflow when the loop continues. Use a signed int for loops like this to make coverity happy. Signed-off-by: Martin Wilck <mwilck@suse.com>
bmarzins
approved these changes
Nov 13, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Typo at: https://github.com/openSUSE/multipath-tools/blob/4c3ade25dcbdc296bd74ca2ac971610b7ef05e12/README.md?plain=1#L37 |
We need one more fix, for systemd/systemd#35135. I'll add this asap. |
Signed-off-by: Martin Wilck <mwilck@suse.com>
The ABI should never change on a stable branch. This workflow asserts that. Signed-off-by: Martin Wilck <mwilck@suse.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Stable branches and new versioning scheme
Beginning with 0.11, the second digit in the multipath-tools version will be
used for upstream "major" releases. The 3rd and last digit will denote stable
releases in the future, containing only bug fixes on top of the last major
release. These bug fixes will be tracked in stable branches.
See README.md for additional information.
multipath-tools 0.11.0, 2024/11
User-visible changes
multipathd.service
such that multipathd will nowrestart after a failure or crash.
Fixes #100.
Other major changes
Rework of the path checking algorithm
This is a continuation of the checker-related work that went into 0.10.0. For
asynchronous checker algorithms (i.e. tur and directio), the start of the
check and the retrieval of the results are done at separate points in time,
which reduces the time for waiting for the checker results of individual paths
and thus strongly improves the performance of the checker algorithm, in
particular on systems with a large a amount of paths.
The algorithm has the following additional properties:
they have not paths
loop, the multipath device will only get synchronized with the kernel once.
the same time (at least as much as their differing checker intervals will
allow).
out as much as possible so multipathd doesn't do all of it's checking in
a burst.
the uninitialized paths are handled after all the adopted paths are
handled).
Bug fixes
of device mapper devices, in particular devices with multiple DM targets.
The problem was introduced in 0.10.0.
Fixes #102.
activated during boot if a cold plug uevent is processed for a previously
not configured multipath map while this map was suspended. This problem existed
since 0.9.8.
no_path_retry fail
and no settingfor
dev_loss_tmo
might get thedev_loss_tmo
set to 0, causing thedevice to be deleted immediately in the event of a transport disruption.
This bug was introduced in 0.9.6.
(
failback
value > 0 inmultipath.conf
), some maps might fail back laterthan configured. The problem existed since 0.9.6.
Other
hardware_handler
have been removed from theinternal hardware table. These settings have been obsoleted by the Linux
kernel 4.3 and later, which automatically selects hardware handlers when
SCSI devices are added. See the notes about
SCSI_DH_MODULES_PRELOAD
inREADME.md.
Free Software Foundation.
Shortlog
@bmarzins (28):
libmultipath: store checker_check() result in checker struct
libmultipath: add missing checker function prototypes
libmultipath: split out the code to wait for pending checkers
libmultipath: remove pending wait code from libcheck_check calls
multipath-tools tests: fix up directio tests
libmultipath: split get_state into two functions
libmultipath: change path_offline to path_sysfs_state
multipathd: split check_path_state into two functions
multipathd: rename do_check_path to update_path_state
multipathd: split check_path into two functions
multipathd: split handle_uninitialized_path into two functions
multipathd: split check_paths into two functions
multipathd: fix "fail path" and "reinstate path" commands
multipathd: update priority once after updating all paths
multipathd: simplify checking for followover_should_failback
multipathd: only refresh prios on PATH_UP and PATH_GHOST
multipathd: remove pointless check
multipathd: fix deferred_failback_tick for reload removes
libmultipath: add libcheck_need_wait checker function
libmultipath: don't wait in libcheck_pending
multipathd: wait for checkers to complete
multipath-tools tests: fix up directio tests
multipathd: checker port_state before setting it.
multipathd: use timestamps to tell when the directio checker timed out
libmultipath: check DM UUID earlier in libmp_mapinfo__
libmultipath: use MAPINFO_CHECK_UUID in dm_get_multipath
multipathd: print an error when failing to connect to multipathd
multipathd.service: restart multipathd on failure
@mwilck (27):
GitHub workflows: use {upload,download}-artifact@v4
GitHub workflows: update dawidd6/action-download-artifact
Github workflows: native.yml: use mosteo-actions/docker-run
GitHub workflows: native.yml: use extra job for clang
GitHub workflows: native.yaml: make test and archive separately
11-dm-mpath.rules.in: import DM_COLDPLUG_SUSPENDED only once
11-dm-mpath.rules.in: handle inactive suspended devices correctly
11-dm-mpath.rules.in: clarify DM_ACTIVATION logic
11-dm-mpath-rules.in: skip one .DM_NOSCAN check
11-dm-mpath.rules.in: set .DM_NOSCAN if MPATH_UNCHANGED is set
libmpathutil: avoid -Wcast-function-type-mismatch error with clang 19
multipath-tools tests: actually sleep in directio test
libmultipath: dm_get_maps(): don't bail out for single-map failures
libmultipath: libmp_mapinfo(): return DMP_NO_MATCH for multi-target maps
libmultipath: make MAPINFO_CHECK_UUID work with partitions
libmultipath: check map UUID in do_foreach_partmaps
libmultipath: increase log level for removing partitions
libmultipath: reduce log level of libmp_mapinfo() messages
libmultipath: don't log boring state messages at level 3
libmultipath: don't set dev_loss_tmo to 0 for NO_PATH_RETRY_FAIL
README.md: mention NEWS.md in Changelog section
NEWS.md: add section for 0.11.0
README.md: mention stable branches
NEWS.md: mention stable branches
libmultipath, libmpathpersist: bump ABI versions
libmultipath: bump version to 0.11.0
multipathd: fix an unsigned int ovwerflow
@xosevp (4):
multipath-tools: remove hwhandler from all hwtable configs
multipath-tools: hwtable housekeeping
multipath-tools: update comments in hwtable
multipath-tools: refresh fsf licences