block: add support for discard #329

g2p · 2019-03-07T14:40:49Z

The idea is to support operations compatible with mirage-block-unix's discard:

discard sector n signals that the n sectors starting at sector are no longer needed and the contents may be discarded. Note the contents may not actually be deleted: this is not a "secure erase".

Which I would update to say this (matching the Unix implementation):

discard sector n signals that the n sectors starting at sector are no longer needed and the contents may be discarded. Future reads will return zeroes. Note that the contents may not actually be impossible to recover: this is not a "secure erase".

mato · 2019-03-07T15:00:07Z

Yes. You probably want to look at #325 first though.

g2p · 2019-03-07T16:10:04Z

Since this interface should be sector aligned, should we break consistency and take offsets and length as sectors? Would it simplify the checking the spt backend would need to do in case the implementation is specialized there (re #325 (comment))?

mato · 2019-03-07T16:43:23Z

That depends on what the backend implementation needs to do. As I've never needed to invoke a discard-type op from Linux, I don't know.

The Solo5 interface I would prefer is:

solo5_result_t solo5_block_discard(solo5_off_t offset, size_t size)

with the usual restrictions (offset must be block-size-aligned, size must be a multiple of block-size).

If you find out what the backend(s) actually need(s) to do to implement a discard (i.e. at least the hvt tender and spt bindings) I'll be able to tell you more.

g2p · 2019-03-07T17:34:12Z

With spt bindings on Linux, this will use fallocate with FALLOC_FL_PUNCH_HOLE and fallback to zeroing manually if the filesystem doesn't support it.

I'm not sure about tenders/hvt because that should just validate and pass params?

I've also looked at virtio (spec amendment); it would use the VIRTIO_BLK_T_WRITE_ZEROES command with the unmap bit set (with some fallbacks which are easy to add). The offsets and lengths are expressed in 512-byte units, but the alignment requirement may be greater.

g2p · 2019-03-07T18:16:38Z

(just to document the virtio fallbacks while the spec is fresh on my mind: if VIRTIO_BLK_F_WRITE_ZEROES is not supported, do the whole thing manually. Otherwise, call VIRTIO_BLK_T_WRITE_ZEROES with unmap set to whether VIRTIO_BLK_F_DISCARD is supported)

mato · 2019-03-08T09:26:10Z

With spt bindings on Linux, this will use fallocate with FALLOC_FL_PUNCH_HOLE and fallback to zeroing manually if the filesystem doesn't support it.

That man page does not actually mention anything about the call issuing a discard (or TRIM) to the hardware. Do you have a source for this?

Anyway, so the Linux backends need to issue a fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, offset, len) call. For spt, the seccomp filter needs to ensure that SYS_fallocate can only be called with that particular value of mode. If I understand the behaviour as documented correctly, the filter should not need to check offset or len.

For hvt, you need to create a hypercall and back it with that implementation.

Before you actually start on this, can you check with the *BSD folks you have around at the retreat what the equivalent calls are there?

mato · 2019-03-08T09:28:22Z

Oh. One more thing, just re-reading the manpage. It only talks about filesystem-backed fds. How do we do the equivalent if the fd is pointing directly to a block device (this is a valid and important use-case!).

g2p · 2019-03-08T10:33:11Z

Linux also implements it for block devices:
See blkdev_fallocate in https://github.com/torvalds/linux/blob/master/fs/block_dev.c

At any rate, the logic for this will be the same regardless of what the backing store is; if ENOTSUP is returned, write zeroes manually.

g2p · 2019-03-08T10:36:06Z

I don't think this exists on BSD per this recent thread: https://freebsd-arch.freebsd.narkive.com/JKZnzc4p/hole-punching-trim-etc
But since we need fallback logic regardless, that is not too problematic.

g2p · 2019-03-08T10:40:57Z

That man page does not actually mention anything about the call issuing a discard (or TRIM) to the hardware. Do you have a source for this?

That's implementation and configuration (mount options, actual device capabilities) dependent.
It's also an optimization and an implementation detail.

mato · 2019-03-08T16:07:02Z

At any rate, the logic for this will be the same regardless of what the backing store is; if ENOTSUP is returned, write zeroes manually.

Right, just make sure that you do that in a single pwrite() call, that way you won't have to modify the pwrite part of the seccomp filter for spt.

mato · 2019-03-09T12:18:33Z

@g2p Moving your questions from Slack here, as I'd like the design discussion to be in one place:

What would I use to allocate a buffer of zeroes from the bindings side? [...] mem_ialloc_pages + memset?

Correct.

The buffer size would be bounded, but to a fairly large value. I'm planning on using 1M for transfer efficiency, or the block size, whichever is greater.

I don't think keeping a 1M buffer around is an acceptable cost to pay just for the sake of being able to emulate discard.

Stepping back a bit, do we need to emulate it at all? Why? In my mind, discard is just an "optimization" for SSDs and/or thinly provisioned storage. If the host doesn't support it, can't it just be a no-op?

g2p · 2019-03-09T12:32:30Z

I need it to actually ensure later reads return zeroes.
If there's an error code that can unambiguously mean ENOTSUP, I'd be okay moving any fallback emulation to a layer higher than Solo5. Or an out parameter. Unfortunately it can't be done with a block_info flag as some of the implementations are based on ENOTSUP return codes, which means you don't know ahead of time.

g2p · 2019-03-09T14:20:31Z

With further thought, I'd be good with flags that ask for zeroing or not, in addition to the out parameter and higher-level fallback logic mentioned above.
I'm not sure if I would expose the additional flags to the mirage interface.
They might be good for future-proofing, I guess, but will trigger breakage on the mirage-block-unix interface.

mato · 2019-03-10T13:16:09Z

I need it to actually ensure later reads return zeroes.

Ok. It seems to me then that the simplest possible interface at the Solo5 layer is that mentioned in my previous comment (#329 (comment)), with the addition of a SOLO5_R_EOPNOTSUPP to solo5_result_t, which solo5_block_discard() should return if the host/implementation does not support the operation.

In other words, no emulation, and the application flow becomes:

r = solo5_block_discard(offset, size);
if (r == SOLO5_R_EOPNOTSUPP) {
    /* It is now the application's responsibility to fall back to writing zeroes to (offset, size), if it desires that behaviour */
}

I'd be good with flags that ask for zeroing or not,

I'm not sure what you mean by that, but if you mean adding a "is discard supported" flag to the struct returned by solo5_block_info(), I don't think there is anything the application gains in the above case?

g2p · 2019-03-10T13:48:29Z

I like the EOPNOTSUPP approach very much, will do that.

Re an explicit "ask for zeroing" flag (as opposed to discarding and having uninitialized data appear), it's not something I plan to use, and after looking at the complexity of implementing both code paths in virtio, I don't think it's worth it. We'll either offer safe and fast zeroing, or let the application deal.

mato · 2019-03-10T13:54:50Z

Sounds good to me.

g2p · 2019-03-10T15:46:52Z

I've started testing my virtio implementation, but it looks like Qemu doesn't support discard for virtio-blk, only for scsi. I don't know of another supported VMM on Linux so I'll drop that code for now. Still making progress on the spt/hvt backends.

g2p · 2019-03-10T16:00:14Z

crosvm has support for the feature: https://chromium.googlesource.com/chromiumos/platform/crosvm/+/7621d910f56ff85400b252f88fdef324a1cc13d6%5E%21/#F0 via https://bugs.chromium.org/p/chromium/issues/detail?id=850998 .

Not sure how hard it would be to integrate.

g2p · 2019-03-10T17:53:57Z

I've built and run crosvm, but I only get messages from early boot, I don't think it can run Solo5's kernel.
So the virtio bits will have to be dropped for now.

Addresses Solo5#329.

g2p mentioned this issue Mar 7, 2019

Get discard support upstreamed mirage/wodan#13

Open

g2p added a commit to g2p/solo5 that referenced this issue Mar 11, 2019

Add discard support and tests

44ca733

Addresses Solo5#329.

g2p mentioned this issue Mar 11, 2019

Discard support #333

Open

g2p added a commit to g2p/solo5 that referenced this issue Mar 11, 2019

Add discard support and tests

ae845a6

Addresses Solo5#329.

mato added the enhancement label Mar 11, 2019

hannesm mentioned this issue Oct 2, 2019

MirageOS 4.0 draft release roadmap mirage/mirage#994

Closed

37 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

block: add support for discard #329

block: add support for discard #329

g2p commented Mar 7, 2019

mato commented Mar 7, 2019

g2p commented Mar 7, 2019

mato commented Mar 7, 2019

g2p commented Mar 7, 2019

g2p commented Mar 7, 2019 •

edited

Loading

mato commented Mar 8, 2019

mato commented Mar 8, 2019

g2p commented Mar 8, 2019

g2p commented Mar 8, 2019

g2p commented Mar 8, 2019

mato commented Mar 8, 2019

mato commented Mar 9, 2019

g2p commented Mar 9, 2019

g2p commented Mar 9, 2019

mato commented Mar 10, 2019

g2p commented Mar 10, 2019

mato commented Mar 10, 2019

g2p commented Mar 10, 2019

g2p commented Mar 10, 2019

g2p commented Mar 10, 2019

block: add support for discard #329

block: add support for discard #329

Comments

g2p commented Mar 7, 2019

mato commented Mar 7, 2019

g2p commented Mar 7, 2019

mato commented Mar 7, 2019

g2p commented Mar 7, 2019

g2p commented Mar 7, 2019 • edited Loading

mato commented Mar 8, 2019

mato commented Mar 8, 2019

g2p commented Mar 8, 2019

g2p commented Mar 8, 2019

g2p commented Mar 8, 2019

mato commented Mar 8, 2019

mato commented Mar 9, 2019

g2p commented Mar 9, 2019

g2p commented Mar 9, 2019

mato commented Mar 10, 2019

g2p commented Mar 10, 2019

mato commented Mar 10, 2019

g2p commented Mar 10, 2019

g2p commented Mar 10, 2019

g2p commented Mar 10, 2019

g2p commented Mar 7, 2019 •

edited

Loading