Commit 0c7325f
committed
Merged PR 12878455: Fix for the Rego "metadata desync" bug
[cherry-picked from 421b12249544a334e36df33dc4846673b2a88279]
This set of changes fixes the [Metadata Desync with UVM
State](https://msazure.visualstudio.com/One/_workitems/edit/33232631/) bug, by
reverting the Rego policy state on mount and some types of unmount failures.
For mounts, a minor cleanup code is added to ensure we close down the dm-crypt
device if we fails to mount it. Aside from this, it is relatively
straightforward - if we get a failure, the clean up functions will remove the
directory, remove any dm-devices, and we will revert the Rego metadata.
For unmounts, careful consideration needs to be taken, since if the directory
has been unmounted successfully (or even partially successful?), and we get an
error, we cannot just revert the policy state, as this may allow the host to use
a broken / empty mount as one of the image layer. See 615c9a0bdf's commit
message for more detailed thoughts.
The solution I opted for is, for non-trivial unmount failure cases (i.e. not
policy denial, not invalid mountpoint), if it fails, then we will block all
further mount, unmount, container creation and deletion attempts. I think this
is OK since we really do not expect unmounts to fail - this is especially the
case for us since the only writable disk mount we have is the shared scratch
disk, which we do not unmount at all unless we're about to kill the UVM.
Testing
-------
The "Rollback policy state on mount errors" commit message has some instruction
for making a deliberately corrupted (but still passes the verifyinfo extraction)
VHD that will cause a mount error. The other way we could make mount / unmount
fail, and thus test this change, is by mounting a tmpfs or ro bind in relevant
places:
To make unmount fail:
mkdir /run/gcs/c/.../rootfs/a && mount -t tmpfs none /run/gcs/c/.../rootfs/a
or
mkdir /run/gcs/mounts/scsi/m1/a && mount -t tmpfs none /run/gcs/mounts/scsi/m1/a
To make mount fail:
mount -o ro --bind /run/mounts/scsi /run/mounts/scsi
or
mount --bind -o ro /run/gcs/c /run/gcs/c
Once failure is triggered, one can make them work again by `umount`ing the tmpfs
or ro bind.
What about other operations?
----------------------------
This PR covers mount and unmount of disks, overlays and 9p. Aside from setting
`metadata.matches` as part of the narrowing scheme, and except for
`metadata.started` to prevent re-using a container ID, Rego does not use
persistent state for any other operations. Since it's not clear whether
reverting the state would be semantically correct (we would need to carefully
consider exactly what are the side effects of say, failing to execute a process,
start a container, or send a signal, etc), and adding the revert to those
operations does not currently affect much behaviour, I've opted not to apply the
metadata revert to those for now.
Signed-off-by: Tingmao Wang <tingmaowang@microsoft.com>1 parent 314dc3d commit 0c7325f
File tree
12 files changed
+932
-42
lines changed- internal
- gcs
- guest
- runtime/hcsv2
- storage
- overlay
- scsi
- regopolicyinterpreter
- pkg/securitypolicy
12 files changed
+932
-42
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
0 commit comments