Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow injecting faults via a unix socket #3

Open
serathius opened this issue Sep 15, 2023 · 3 comments
Open

Allow injecting faults via a unix socket #3

serathius opened this issue Sep 15, 2023 · 3 comments

Comments

@serathius
Copy link

serathius commented Sep 15, 2023

Hey, thanks for adding "reorder" and "split_write" failures and filing etcd-io/etcd#16596 bug. I'm really interested in using those in etcd testing.

However, the current implementation of fault injection is not compatible with how etcd is tested. We need to control when and how the injection happens so we can verify the etc state before and after.

I would love if LazyFS provided a command to unix socket (similar to clear cache) that allows user to invoke faults like "reorder" on the last write.

Issue to track this on etcd side etcd-io/etcd#16597

Thanks for all your great work!

@serathius serathius changed the title Allow injecting fauls via a unix socket Allow injecting faults via a unix socket Sep 15, 2023
@mj-ramos
Copy link
Collaborator

Hello, I explained with some detail the reason why these two faults are currently not injected through a FIFO and some possibilities that could be useful for etcd testing in this comment etcd-io/etcd#16597 (comment)

Thanks for the feedback!

@fuweid
Copy link

fuweid commented Nov 18, 2023

Hi, @mj-ramos

Recently, I use dm-flakey https://github.com/fuweid/go-dmflakey to simulate power failure.
But the dm-flakey device in BIO layer doesn't distinguish the content between file's and filesystem's metadata.
The drop_writes is easy to break the filesystem. I created a reproducer test case for boltdb.
It's tricky because the test only uses fdatasync. So I am thinking what if lazyfs can provide the similar function at file level.

  • drop_writes: it can be trigged by fifo. when it's enable, all the write IO should be redirected to temporary file so that it can ensure all the read is functioning. after cache clean, the temporary file should be deleted so that it can simulate the data loss after power failure. (It's more like about split_write, but it doesn't need to set which write syscall should be split)

What do you think about it?

@mj-ramos
Copy link
Collaborator

Hi, I apologize for the late reply. I've been quite busy lately. Thank you for your interest in LazyFS!

Certainly, LazyFS can accomplish that. If I understand correctly, the feature you are suggesting appears to be somewhat similar to the clear cache command, which removes all the contents of the cache at a certain point. With this command, reads are not compromised; however, if neither fsync nor fdatasync is issued, the writes will be dropped. It seems to me that you specifically want to discard writes for a particular file? Is this correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants