-
Notifications
You must be signed in to change notification settings - Fork 181
Virtual disk: add migration case with vhostuser disk #6690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughAdds a new libvirt migration test configuration file Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
🧰 Additional context used🧠 Learnings (2)📚 Learning: 2025-09-24T08:01:27.899ZApplied to files:
📚 Learning: 2025-11-18T01:20:50.891ZApplied to files:
🪛 Ruff (0.14.6)libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py30-30: Function call with (S604) 32-32: Function call with (S604) 35-35: Function call with (S604) 37-37: Function call with (S604) 77-77: Probable insecure usage of temporary file or directory: "/tmp/vhost.sock" (S108) 168-168: Function call with (S604) ⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
🔇 Additional comments (4)
Comment |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py (3)
51-54: Consider using consistent image creation method.The source host uses
libvirt.create_local_disk("file", ...)while the remote host usesdd if=/dev/zero. While both work, using consistent methods improves maintainability.
113-115: Strengthen vhostuser disk validation.The current check only verifies the string "vhostuser" exists in the XML dump. Consider validating the specific disk configuration (target device, socket path) for more robust verification.
- # Check if vhostuser disk is accessible in VM - if "vhostuser" not in virsh.dumpxml(vm_name).stdout_text: - test.fail("Check vhostuser disk in VM failed") + # Check if vhostuser disk is present in VM configuration + vmxml_check = vm_xml.VMXML.new_from_dumpxml(vm_name) + disk_list = vmxml_check.get_disk_all() + vhostuser_found = any(d.get('type') == 'vhostuser' for d in disk_list.values()) + if not vhostuser_found: + test.fail("Check vhostuser disk in VM failed")
88-94: Consider more robust service readiness check.The
sleep 3at the end is a timing-based approach to wait for the daemon to be ready. For more reliable tests, consider checking for socket file existence or service status instead of a fixed sleep.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
libvirt/tests/cfg/migration/migration_with_disk/migration_with_vhostuser.cfg(1 hunks)libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.6)
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py
31-31: Function call with shell=True parameter identified, security issue
(S604)
33-33: Function call with shell=True parameter identified, security issue
(S604)
36-36: Function call with shell=True parameter identified, security issue
(S604)
38-38: Function call with shell=True parameter identified, security issue
(S604)
81-81: Probable insecure usage of temporary file or directory: "/tmp/vhost.sock"
(S108)
84-84: Local variable device_target is assigned to but never used
Remove assignment to unused variable device_target
(F841)
85-85: Use of possibly insecure function; consider using ast.literal_eval
(S307)
175-175: Function call with shell=True parameter identified, security issue
(S604)
180-180: Function call with shell=True parameter identified, security issue
(S604)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Python 3.11
- GitHub Check: Python 3.12
- GitHub Check: Python 3.8
- GitHub Check: Python 3.9
🔇 Additional comments (5)
libvirt/tests/cfg/migration/migration_with_disk/migration_with_vhostuser.cfg (2)
34-35: Empty variant placeholder.The
with_precopyvariant is defined but contains no variant-specific parameters. If precopy is the only migration mode being tested, this is acceptable as a placeholder for future postcopy or other variants.
29-29: Based on my investigation, I found that:
libvirt's memory backing XML element: The correct XML element is
<memoryBacking>with subelements like<source type="memfd"/>and<access mode="shared"/>— matching the structure in line 29.Limitation: I cannot access the repository to verify the exact key mapping layer (how "mb" is converted to XML), as the repository clone failed.
The structure of the vm_attrs appears consistent with libvirt's memoryBacking configuration, but without access to the framework's code that processes vm_attrs, I cannot definitively confirm whether "mb" is the correct key convention used by the testing framework.
Verify the key mapping in vm_xml processing code.
The
vm_attrsuses"mb"for memory backing configuration. While the internal structure (source_type,access_mode) matches libvirt's<memoryBacking>subelements, direct codebase verification is needed to confirm that"mb"is the correct key recognized by the testing framework's XML generator.libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py (3)
20-39: LGTM with minor observation.The function correctly sets up the vhost socket service on the source host. The use of
shell=Trueis acceptable in this test framework context where command strings contain complex shell constructs.
156-187: LGTM!The cleanup function properly handles all resources: VM destruction, XML restoration, connection cleanup, and file/service cleanup on both local and remote hosts. The use of
ignore_status=Trueensures cleanup continues even if individual steps fail.
189-211: LGTM!The main execution flow follows good practices: version check, variable initialization, XML backup, and a try/finally block ensuring cleanup runs regardless of test outcome.
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py
Outdated
Show resolved
Hide resolved
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py
Outdated
Show resolved
Hide resolved
d3ec636 to
8ca0104
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py (1)
80-80: Consider using a scoped, non‑/tmp socket pathUsing a fixed
/tmp/vhost.sockrisks collisions or stale leftovers because/tmpis world‑writable and shared across processes/tests. To keep things isolated, consider deriving a unique path (e.g. includevm_name, PID, or a UUID) under a test‑specific directory, such as something based ondata_dir.get_data_dir()or another per‑test temp dir.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
libvirt/tests/cfg/migration/migration_with_disk/migration_with_vhostuser.cfg(1 hunks)libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- libvirt/tests/cfg/migration/migration_with_disk/migration_with_vhostuser.cfg
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-24T08:01:27.899Z
Learnt from: hholoubk
Repo: autotest/tp-libvirt PR: 6579
File: libvirt/tests/src/sriov/vIOMMU/iommu_device_lifecycle.py:95-97
Timestamp: 2025-09-24T08:01:27.899Z
Learning: In the libvirt test framework used in tp-libvirt, VM cleanup including destroying running VMs is handled by the teardown method (test_obj.teardown_iommu_test()) called in the finally block, so explicit VM destroy calls on timeout are not necessary according to the maintainers.
Applied to files:
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py
🪛 Ruff (0.14.6)
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py
30-30: Function call with shell=True parameter identified, security issue
(S604)
32-32: Function call with shell=True parameter identified, security issue
(S604)
35-35: Function call with shell=True parameter identified, security issue
(S604)
37-37: Function call with shell=True parameter identified, security issue
(S604)
80-80: Probable insecure usage of temporary file or directory: "/tmp/vhost.sock"
(S108)
173-173: Function call with shell=True parameter identified, security issue
(S604)
178-178: Function call with shell=True parameter identified, security issue
(S604)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Python 3.8
- GitHub Check: Python 3.9
- GitHub Check: Python 3.11
- GitHub Check: Python 3.12
🔇 Additional comments (3)
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py (3)
72-83: Setup flow and parameter parsing look consistent and safeUsing
ast.literal_evalfordisk_dictandvm_attrsis much safer thaneval, and the VM XML manipulation (setup_attrs,create_vm_device_by_type,add_device,sync) matches existing tp‑libvirt patterns. The initial boot/login plus vhostuser presence check viavirsh.dumpxmlprovides a reasonable pre‑migration sanity gate.Also applies to: 100-110
124-148: Post‑migration in‑guest verification is solidSwitching
vm.connect_urito the destination, recreating the serial console, checkinglsblkfordevice_target, and then doing add_data_to_vm_diskwrite are good end‑to‑end checks that the vhostuser disk survives migration and remains usable. Thefinallyblock closingvm_sessionand restoringvm.connect_urikeeps the connection handling tidy.
154-185: Overall test orchestration and cleanup are well‑structuredThe
runfunction’s structure—setup_test()→run_migration()→verify_test()withcleanup_test()infinally—aligns with other migration tests. Backing up the inactive XML and restoring it in cleanup, plus guardingvm.destroy()withvm.is_alive(), should leave the VM definition in a clean state after the test.Once the service lifecycle points above are addressed, the test looks ready to go.
Also applies to: 187-208
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py
Outdated
Show resolved
Hide resolved
| start_sock_service_cmd = ( | ||
| 'systemd-run --uid qemu --gid qemu /usr/bin/qemu-storage-daemon' | ||
| ' --blockdev \'{"driver":"file","filename":"%s","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}\'' | ||
| ' --blockdev \'{"node-name":"libvirt-1-format","read-only":false,"driver":"raw","file":"libvirt-1-storage"}\'' | ||
| ' --export vhost-user-blk,id=vhost-user-blk0,node-name=libvirt-1-format,addr.type=unix,addr.path=%s,writable=on' | ||
| ' --chardev stdio,mux=on,id=char0; sleep 3' | ||
| % (image_path, sock_path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tighten qemu‑storage‑daemon lifecycle management (systemd‑run output + killall)
Two related points here:
-
Service ID from
systemd-run
start_vhost_sock_service_in_source/_in_remotecapture the raw stdout ofsystemd-runasvsock_service_id/remote_vsock_service_id, then later callsystemctl stop {id}.systemd-runtypically prints a sentence likeRunning as unit: foo.service, not just the unit name, so passing that whole string tosystemctl stopis fragile and likely a no‑op. You either need to:- Specify a known
--unit=<name>in thesystemd-runcommand and use that name directly forsystemctl stop, or - Parse the unit name out of stdout before storing it.
- Specify a known
-
Global
killall qemu-storage-daemonon both hosts
Thepidof qemu-storage-daemon && killall qemu-storage-daemoncleanup on source and remote will terminate allqemu-storage-daemoninstances, not just the one this test started. That can interfere with other tests or any legitimate services using qemu‑storage‑daemon on the same hosts.
It would be safer to manage just the service(s) you start (via explicit --unit names or PIDs) and avoid broad killall unless you know the environment is completely dedicated.
Also applies to: 95-98, 166-170, 173-178
🤖 Prompt for AI Agents
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py
around lines 86-92 (and similarly 95-98, 166-170, 173-178): the test starts
qemu-storage-daemon with systemd-run and later stops it by passing the raw
systemd-run stdout to systemctl and by running global killall; change the start
command to include an explicit --unit=<unique_name> (or capture and parse the
unit name from systemd-run stdout) and store that unit name so subsequent
systemctl stop uses only that unit, and stop using killall qemu-storage-daemon —
instead record the daemon PID(s) you start (or rely on the named systemd unit)
and only kill/stop those specific PIDs/units during cleanup to avoid terminating
other processes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've adjusted this part, please check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the current state of the vhostuser migration test file to see the adjustments
cat libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py
Length of output: 7424
@meinaLi, I appreciate the effort, but the core issue isn't fully resolved yet. While you removed the broken systemctl stop logic (which is good), the cleanup still uses pkill -f qemu-storage-daemon, which has the same problem as the original killall qemu-storage-daemon.
The problem: Both killall and pkill -f qemu-storage-daemon will terminate all qemu-storage-daemon processes on the host—not just the ones your test started. This can break other concurrent tests or legitimate services.
Recommended solution (choose one):
Option 1: Use explicit systemd unit names (cleanest approach)
# In the command construction (line ~86):
unit_name = f"vhost-test-{vm_name}-{int(time.time())}" # unique unit name
start_sock_service_cmd = (
f'systemd-run --unit={unit_name} --uid qemu --gid qemu /usr/bin/qemu-storage-daemon'
' --blockdev \'{"driver":"file","filename":"%s","node-name":"libvirt-1-storage",...}\''
...
)
# Store the unit_name in setup_test (return from start_vhost_* functions)
# Then in cleanup_test:
process.run(f"systemctl stop {unit_name}", ignore_status=True, shell=True)
remote.run_remote_cmd(f"systemctl stop {remote_unit_name}", params, ignore_status=True)Option 2: Track PIDs
Capture the PID from the systemd-run output or the qemu-storage-daemon process itself, then kill only that specific PID during cleanup.
Would you like me to provide a more detailed code example for either approach?
🧠 Learnings used
Learnt from: hholoubk
Repo: autotest/tp-libvirt PR: 6579
File: libvirt/tests/src/sriov/vIOMMU/iommu_device_lifecycle.py:95-97
Timestamp: 2025-09-24T08:01:27.899Z
Learning: In the libvirt test framework used in tp-libvirt, VM cleanup including destroying running VMs is handled by the teardown method (test_obj.teardown_iommu_test()) called in the finally block, so explicit VM destroy calls on timeout are not necessary according to the maintainers.
8ca0104 to
822da95
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py (2)
27-37: SELinux booleandomain_can_mmap_filesis changed persistently on both hosts without restoreBoth locally (Line 36–37) and remotely (Line 57–58) you run
setsebool domain_can_mmap_files 1 -P, which permanently flips this SELinux boolean for the entire system and survives reboots. The test never restores the previous value, so it leaves a host‑wide configuration change behind and may affect other tests or users.Consider one of the following instead:
- Treat
domain_can_mmap_files=onas an environmental precondition and document it in the test/config, removing thesetseboolcalls from the test, or- Capture the original value via
getsebool domain_can_mmap_fileson source/remote and restore it incleanup_test(or a higher‑level teardown) so the host returns to its prior state.As written this remains a major environment side‑effect.
Also applies to: 49-58
164-174:pkill -f qemu-storage-daemonkills all daemons on both hostsIn
cleanup_test, the following commands:
- Line 165:
remote.run_remote_cmd("pkill -f qemu-storage-daemon", ...)- Line 169:
process.run("pkill -f qemu-storage-daemon", ...)will terminate every
qemu-storage-daemonprocess on the remote and local hosts, not just the instance(s) started by this test. This can interfere with parallel tests or any other services using qemu‑storage‑daemon, which is the same concern previously raised forkillall.Since you already start the daemon via
systemd-run, a more targeted approach would be to:
- Give the transient unit an explicit, unique name via
--unit=<name>instart_sock_service_cmd, and- In
cleanup_test, stop only that unit on each host, e.g.systemctl stop <name>(withignore_status=True), then drop thepkillcalls entirely.That keeps the cleanup scoped strictly to what this test started instead of indiscriminately killing all daemons.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
libvirt/tests/cfg/migration/migration_with_disk/migration_with_vhostuser.cfg(1 hunks)libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- libvirt/tests/cfg/migration/migration_with_disk/migration_with_vhostuser.cfg
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-24T08:01:27.899Z
Learnt from: hholoubk
Repo: autotest/tp-libvirt PR: 6579
File: libvirt/tests/src/sriov/vIOMMU/iommu_device_lifecycle.py:95-97
Timestamp: 2025-09-24T08:01:27.899Z
Learning: In the libvirt test framework used in tp-libvirt, VM cleanup including destroying running VMs is handled by the teardown method (test_obj.teardown_iommu_test()) called in the finally block, so explicit VM destroy calls on timeout are not necessary according to the maintainers.
Applied to files:
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py
🧬 Code graph analysis (1)
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py (1)
libvirt/tests/src/migration/migrate_vm.py (1)
run_remote_cmd(759-776)
🪛 Ruff (0.14.6)
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py
30-30: Function call with shell=True parameter identified, security issue
(S604)
32-32: Local variable cmd_output is assigned to but never used
Remove assignment to unused variable cmd_output
(F841)
32-32: Function call with shell=True parameter identified, security issue
(S604)
35-35: Function call with shell=True parameter identified, security issue
(S604)
37-37: Function call with shell=True parameter identified, security issue
(S604)
55-55: Local variable remote_vsock_service_id is assigned to but never used
Remove assignment to unused variable remote_vsock_service_id
(F841)
78-78: Probable insecure usage of temporary file or directory: "/tmp/vhost.sock"
(S108)
169-169: Function call with shell=True parameter identified, security issue
(S604)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Python 3.12
- GitHub Check: Python 3.8
- GitHub Check: Python 3.11
- GitHub Check: Python 3.9
🔇 Additional comments (1)
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py (1)
61-151: Overall migration flow and VM‑side verification look solidThe high‑level orchestration in
run()—setup_testto configure the vhost-user backend and disk,migration_obj.run_migration(),verify_testto switch to the destination and exercise the disk vialsblk+dd_data_to_vm_disk, andmigration_obj.verify_default()—is coherent and matches existing migration patterns in tp-libvirt. The use ofast.literal_evalfordisk_dictandvm_attrs, plus serial console login on the destination, is a nice touch for safety and robustness.Aside from the SELinux/daemon‑lifecycle concerns noted elsewhere, the core test logic reads correctly.
Also applies to: 178-197
libvirt/tests/src/migration/migration_with_disk/migration_with_vhostuser.py
Outdated
Show resolved
Hide resolved
Automate case: RHEL-200524: Migrate VM with a vhostuser disk Signed-off-by: Meina Li <meili@redhat.com>
822da95 to
805abd7
Compare
Automate case:
RHEL-200524: Migrate VM with a vhostuser disk
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.