Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force shut down dependent VMs passed via restart parameter #1248

Closed
wants to merge 1 commit into from

Conversation

rocodes
Copy link
Contributor

@rocodes rocodes commented Feb 14, 2025

Status

Ready for review

Description of Changes

Fixes #1247

Changes proposed in this pull request:
Ensure VMs passed via restart parameter in sdw-admin's configure method exist before restarting them, then force-shutdown instead of just shutting down.

The configure method allows for configuring a given VM, and offers the possibility to pass a list of other dependent VMs that should be rebooted upon changes (basically, as soon as a template is changed, restart a set of VMs based on it). Right now we use that on sd-log and sys-usb when we change their templates.

Testing

  • Visual Review
  • CI (dom0 tests)

Deployment

n/a

Checklist

If you have made changes to the provisioning logic

  • All tests (make test) pass in dom0

If you have added or removed files

  • I have updated MANIFEST.in and rpm-build/SPECS/securedrop-workstation-dom0-config.spec

If documentation is required

  • I have opened a PR in the docs repo for these changes, or will do so later
  • I would appreciate help with the documentation

@rocodes rocodes requested a review from a team February 14, 2025 14:15
@rocodes rocodes force-pushed the 1247-sdlog-shutdown branch 4 times, most recently from 7d1439f to ffb2a41 Compare February 14, 2025 20:14
…applying changes. Force shut down those VMs in `configure` method when their template changes.
@rocodes rocodes force-pushed the 1247-sdlog-shutdown branch from ffb2a41 to 4f60038 Compare February 14, 2025 23:10
@deeplow deeplow self-assigned this Feb 18, 2025
@rocodes
Copy link
Contributor Author

rocodes commented Feb 18, 2025

Just adding an explanatory comment here in case helpful: the configure method allows for an additional optional list of
hard-coded VMs to be passed in, to reboot once the target vm has been configured. But there are no checks in configure that the names of the target VMs correspond to VMs that currently exist at the time the method is called. So it would be possible to say call configure("sd-small-bookworm-template", restart=["sd-madeup"]). We would then call run_cmd(["qvm-shutdown --wait -- sd-madeup"]), which would return a non-zero exit status and raise an error.

This PR addresses that case.

I don't have strong feelings about the force-shutdown. A more conservative approach would be to just try this without that change and see if we still run into issues in CI. The reason I kept it in is because if we don't succeed here, provisioning fails, and (eg in the case of securedrop-handle-upgrade) we are even more aggressive, killing vms instead of force shutting down. I think the CI issues we have been seeing will be resolved by the check that makes sure the VM is in the list of available domains.

Copy link
Contributor

@deeplow deeplow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I don't think this is the root cause, though. I have elaborated more on the original issue. But its a welcome fix: the --force (pun intended) also saves users from unnecessary "failed to shut down" messages.

When I originally moved the bash code into configure()-like statements, I think I had passed all targeted qubes because I assumed it would be more efficient. Your commit calls them one by one. However, the code to shutdown qubes is extremely innefficient, especially in dom0 where it can know which are the netvm. So it doesn't make that much of a difference. Just thought I'd note the way it was like that originally.

I added minor code suggestions.

files/sdw-admin.py Show resolved Hide resolved
run_cmd(["qvm-shutdown", "--wait", "--"] + restart)
run_cmd(["qvm-start", "--"] + restart)
for domain in restart:
if domain in Qubes().domains:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding an explanatory comment here in case helpful: the configure method allows for an additional optional list of
hard-coded VMs to be passed in, to reboot once the target vm has been configured. But there are no checks in configure that the names of the target VMs correspond to VMs that currently exist at the time the method is called. So it would be possible to say call configure("sd-small-bookworm-template", restart=["sd-madeup"]). We would then call run_cmd(["qvm-shutdown --wait -- sd-madeup"]), which would return a non-zero exit status and raise an error.

I would actually advocate for this to be a hard fail. We should catch it during testing / QA. Otherwise, we run the risk of silencing something we did not intend to even pass.

If we were to add an extra check, I'd say to check that the qubes being restarted are based on the targets (if they are templates). Or even remove the restarts and obtain it by getting all the qubes whose templates were targets.

@rocodes
Copy link
Contributor Author

rocodes commented Feb 19, 2025

@deeplow Are you requesting changes or is this PR satisfactory? (Specifically, are you requesting more in-depth logic around the restart parameter, either a hard fail if the template doesn't exist, or additional logic that enforces some relationship between the vm under configuration and any vms passed via "restart")?

@deeplow
Copy link
Contributor

deeplow commented Feb 19, 2025

@deeplow Are you requesting changes or is this PR satisfactory? (Specifically, are you requesting more in-depth logic around the restart parameter, either a hard fail if the template doesn't exist, or additional logic that enforces some relationship between the vm under configuration and any vms passed via "restart")?

I should have been more clear, sorry about that. I'm not sure about the part of the PR about trying to shutdown qubes which don't yet exist (because I don't think it should happen and if it does, we will want to catch that). My suggestion would be to just remove the part: if domain in Qubes().domains:. That will lead to a hard fail in case of domain shutdown.

If you look on the original issue's log, you see:

INFO:[2025-02-13-23:03:48:065875] qvm-shutdown: error: Failed to shut down: sd-log

If it were due to sd-log not existing, the error message would have been instead:

qvm-shutdown: error: no such domain: 'sd-log'

So I think we should keep this part as it was. But it could be that I am misunderstanding the original problem.

@rocodes
Copy link
Contributor Author

rocodes commented Feb 19, 2025

Per #1247 (comment) , a complete solution would involve moving the autostart configuration of sd-log til after the provisioning completes, or something else to address the underlying race. To be revisited.

@rocodes rocodes closed this Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Sometimes sd-log shutdown error makes provisioning fail
2 participants