Skip to content

Possible crash consistency issue with truncate using multiple file descriptors #9

@hayley-leblanc

Description

@hayley-leblanc

Hi Rohan,

I think I've found a new crash consistency issue in WineFS that occurs if multiple file descriptors are used to modify a file and we crash while increasing the file's size using truncate. The bug is a bit complicated and unfortunately I don't have a good way to replicate it outside of my testing infrastructure and don't fully understand the root cause yet, so let me know if anything doesn't make sense. I'm running WineFS in strict mode; I don't know if this occurs in relaxed mode as well.

Suppose we have a program that performs the following operations:

1. create file0
2. fallocate file0 in FALLOC_FL_KEEP_SIZE mode, offset 0x4ce0, len 0x1 
3. write 1 byte to file0
4. ftruncate file0 to 1 byte
5. write 1 byte to file0
6. ftruncate file0 to 64 bytes

The workload is a bit strange (it was generated by our fuzzer) but I've minimized it as much as I can without removing the ability to reproduce the issue.

If we open a file descriptor on line 1 and use it for the rest of the operations, everything works as expected. However, suppose we open a new file descriptor for file0 between lines 2 and 3 and use it for the write and ftruncate on lines 3 and 4, then open another one betwen lines 4 and 5 and use it for the final two operations (as in this program: contenthash-test.zip).

Now, let's say that we crash after the flush on line 2091 of inode.c (flushing the head of the truncate list in pmfs_truncate_add()) goes through and that data becomes durable when performing the second ftruncate(). When the file system is re-mounted and runs recovery, it will recover the truncate list and replay the truncate operation on the truncated inode. During normal/non-crashing execution, the truncate operation causes the extra bytes to be zeroed out, but it seems that this step is skipped here during recovery, and the recovered file can end up with some random garbage in it.

I haven't yet narrowed down a root cause for this; truncation during normal execution vs. recovery use mostly the same code, and I haven't had the opportunity to dig into where things differ. I suspect right now that the use of multiple file descriptors causes some weird regular-execution behavior where the persistent inode might not look the same as it would if we just used 1 file descriptor, so recovery goes a bit differently, but I'm not sure.

Let me know if this makes sense or you have any ideas about what might be going on. I'll keep looking into this issue as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions