Allow injecting faults via a unix socket #3

serathius · 2023-09-15T11:33:25Z

Hey, thanks for adding "reorder" and "split_write" failures and filing etcd-io/etcd#16596 bug. I'm really interested in using those in etcd testing.

However, the current implementation of fault injection is not compatible with how etcd is tested. We need to control when and how the injection happens so we can verify the etc state before and after.

I would love if LazyFS provided a command to unix socket (similar to clear cache) that allows user to invoke faults like "reorder" on the last write.

Issue to track this on etcd side etcd-io/etcd#16597

Thanks for all your great work!

mj-ramos · 2023-09-20T13:12:48Z

Hello, I explained with some detail the reason why these two faults are currently not injected through a FIFO and some possibilities that could be useful for etcd testing in this comment etcd-io/etcd#16597 (comment)

Thanks for the feedback!

fuweid · 2023-11-18T10:33:00Z

Hi, @mj-ramos

Recently, I use dm-flakey https://github.com/fuweid/go-dmflakey to simulate power failure.
But the dm-flakey device in BIO layer doesn't distinguish the content between file's and filesystem's metadata.
The drop_writes is easy to break the filesystem. I created a reproducer test case for boltdb.
It's tricky because the test only uses fdatasync. So I am thinking what if lazyfs can provide the similar function at file level.

drop_writes: it can be trigged by fifo. when it's enable, all the write IO should be redirected to temporary file so that it can ensure all the read is functioning. after cache clean, the temporary file should be deleted so that it can simulate the data loss after power failure. (It's more like about split_write, but it doesn't need to set which write syscall should be split)

What do you think about it?

mj-ramos · 2023-12-11T15:08:12Z

Hi, I apologize for the late reply. I've been quite busy lately. Thank you for your interest in LazyFS!

Certainly, LazyFS can accomplish that. If I understand correctly, the feature you are suggesting appears to be somewhat similar to the clear cache command, which removes all the contents of the cache at a certain point. With this command, reads are not compromised; however, if neither fsync nor fdatasync is issued, the writes will be dropped. It seems to me that you specifically want to discard writes for a particular file? Is this correct?

serathius changed the title ~~Allow injecting fauls via a unix socket~~ Allow injecting faults via a unix socket Sep 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow injecting faults via a unix socket #3

Allow injecting faults via a unix socket #3

serathius commented Sep 15, 2023 •

edited

Loading

mj-ramos commented Sep 20, 2023

fuweid commented Nov 18, 2023 •

edited

Loading

mj-ramos commented Dec 11, 2023

Allow injecting faults via a unix socket #3

Allow injecting faults via a unix socket #3

Comments

serathius commented Sep 15, 2023 • edited Loading

mj-ramos commented Sep 20, 2023

fuweid commented Nov 18, 2023 • edited Loading

mj-ramos commented Dec 11, 2023

serathius commented Sep 15, 2023 •

edited

Loading

fuweid commented Nov 18, 2023 •

edited

Loading