Force shut down dependent VMs passed via `restart` parameter #1248

rocodes · 2025-02-14T14:15:13Z

Status

Ready for review

Description of Changes

Changes proposed in this pull request:
Ensure VMs passed via restart parameter in sdw-admin's configure method exist before restarting them, then force-shutdown instead of just shutting down.

The configure method allows for configuring a given VM, and offers the possibility to pass a list of other dependent VMs that should be rebooted upon changes (basically, as soon as a template is changed, restart a set of VMs based on it). Right now we use that on sd-log and sys-usb when we change their templates.

Testing

Visual Review
CI (dom0 tests)

Deployment

n/a

Checklist

If you have made changes to the provisioning logic

All tests (make test) pass in dom0

If you have added or removed files

I have updated MANIFEST.in and rpm-build/SPECS/securedrop-workstation-dom0-config.spec

If documentation is required

I have opened a PR in the docs repo for these changes, or will do so later
I would appreciate help with the documentation

…applying changes. Force shut down those VMs in `configure` method when their template changes.

rocodes · 2025-02-18T20:09:57Z

Just adding an explanatory comment here in case helpful: the configure method allows for an additional optional list of
hard-coded VMs to be passed in, to reboot once the target vm has been configured. But there are no checks in configure that the names of the target VMs correspond to VMs that currently exist at the time the method is called. So it would be possible to say call configure("sd-small-bookworm-template", restart=["sd-madeup"]). We would then call run_cmd(["qvm-shutdown --wait -- sd-madeup"]), which would return a non-zero exit status and raise an error.

This PR addresses that case.

I don't have strong feelings about the force-shutdown. A more conservative approach would be to just try this without that change and see if we still run into issues in CI. The reason I kept it in is because if we don't succeed here, provisioning fails, and (eg in the case of securedrop-handle-upgrade) we are even more aggressive, killing vms instead of force shutting down. I think the CI issues we have been seeing will be resolved by the check that makes sure the VM is in the list of available domains.

deeplow

Thanks for the PR. I don't think this is the root cause, though. I have elaborated more on the original issue. But its a welcome fix: the --force (pun intended) also saves users from unnecessary "failed to shut down" messages.

When I originally moved the bash code into configure()-like statements, I think I had passed all targeted qubes because I assumed it would be more efficient. Your commit calls them one by one. However, the code to shutdown qubes is extremely innefficient, especially in dom0 where it can know which are the netvm. So it doesn't make that much of a difference. Just thought I'd note the way it was like that originally.

I added minor code suggestions.

files/sdw-admin.py

deeplow · 2025-02-19T11:09:06Z

files/sdw-admin.py

-        run_cmd(["qvm-shutdown", "--wait", "--"] + restart)
-        run_cmd(["qvm-start", "--"] + restart)
+        for domain in restart:
+            if domain in Qubes().domains:


Just adding an explanatory comment here in case helpful: the configure method allows for an additional optional list of
hard-coded VMs to be passed in, to reboot once the target vm has been configured. But there are no checks in configure that the names of the target VMs correspond to VMs that currently exist at the time the method is called. So it would be possible to say call configure("sd-small-bookworm-template", restart=["sd-madeup"]). We would then call run_cmd(["qvm-shutdown --wait -- sd-madeup"]), which would return a non-zero exit status and raise an error.

I would actually advocate for this to be a hard fail. We should catch it during testing / QA. Otherwise, we run the risk of silencing something we did not intend to even pass.

If we were to add an extra check, I'd say to check that the qubes being restarted are based on the targets (if they are templates). Or even remove the restarts and obtain it by getting all the qubes whose templates were targets.

rocodes · 2025-02-19T14:27:21Z

@deeplow Are you requesting changes or is this PR satisfactory? (Specifically, are you requesting more in-depth logic around the restart parameter, either a hard fail if the template doesn't exist, or additional logic that enforces some relationship between the vm under configuration and any vms passed via "restart")?

deeplow · 2025-02-19T15:01:10Z

@deeplow Are you requesting changes or is this PR satisfactory? (Specifically, are you requesting more in-depth logic around the restart parameter, either a hard fail if the template doesn't exist, or additional logic that enforces some relationship between the vm under configuration and any vms passed via "restart")?

I should have been more clear, sorry about that. I'm not sure about the part of the PR about trying to shutdown qubes which don't yet exist (because I don't think it should happen and if it does, we will want to catch that). My suggestion would be to just remove the part: if domain in Qubes().domains:. That will lead to a hard fail in case of domain shutdown.

If you look on the original issue's log, you see:

INFO:[2025-02-13-23:03:48:065875] qvm-shutdown: error: Failed to shut down: sd-log

If it were due to sd-log not existing, the error message would have been instead:

qvm-shutdown: error: no such domain: 'sd-log'

So I think we should keep this part as it was. But it could be that I am misunderstanding the original problem.

rocodes · 2025-02-19T16:11:02Z

Per #1247 (comment) , a complete solution would involve moving the autostart configuration of sd-log til after the provisioning completes, or something else to address the underlying race. To be revisited.

rocodes requested a review from a team February 14, 2025 14:15

rocodes force-pushed the 1247-sdlog-shutdown branch 4 times, most recently from 7d1439f to ffb2a41 Compare February 14, 2025 20:14

Ensure dependent VMs passed via the restart parameter exist before …

4f60038

…applying changes. Force shut down those VMs in `configure` method when their template changes.

rocodes force-pushed the 1247-sdlog-shutdown branch from ffb2a41 to 4f60038 Compare February 14, 2025 23:10

deeplow self-assigned this Feb 18, 2025

rocodes mentioned this pull request Jan 22, 2025

Release SecureDrop Workstation 1.1.0 #1209

Open

14 tasks

deeplow requested changes Feb 19, 2025

View reviewed changes

rocodes closed this Feb 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Force shut down dependent VMs passed via `restart` parameter #1248

Force shut down dependent VMs passed via `restart` parameter #1248

rocodes commented Feb 14, 2025 •

edited

Loading

rocodes commented Feb 18, 2025 •

edited

Loading

deeplow left a comment

deeplow Feb 19, 2025

rocodes commented Feb 19, 2025 •

edited

Loading

deeplow commented Feb 19, 2025

rocodes commented Feb 19, 2025

Force shut down dependent VMs passed via restart parameter #1248

Force shut down dependent VMs passed via restart parameter #1248

Conversation

rocodes commented Feb 14, 2025 • edited Loading

Status

Description of Changes

Testing

Deployment

Checklist

If you have made changes to the provisioning logic

If you have added or removed files

If documentation is required

rocodes commented Feb 18, 2025 • edited Loading

deeplow left a comment

Choose a reason for hiding this comment

deeplow Feb 19, 2025

Choose a reason for hiding this comment

rocodes commented Feb 19, 2025 • edited Loading

deeplow commented Feb 19, 2025

rocodes commented Feb 19, 2025

Force shut down dependent VMs passed via `restart` parameter #1248

Force shut down dependent VMs passed via `restart` parameter #1248

rocodes commented Feb 14, 2025 •

edited

Loading

rocodes commented Feb 18, 2025 •

edited

Loading

rocodes commented Feb 19, 2025 •

edited

Loading