You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When nova is unmounted, the snapshot cleaner kthread is stopped with kthread_stop() in nova_save_snapshots(). If schedule() is called within kthread_stop()'s wait_for_completion(), the kthread will go to sleep
forever waiting for an interrupt, resulting in a hang.
Mount a fresh nova instance using the 'mount -t NOVA -o init' command
Unmount nova
Remount nova at the same mount point
Repeat steps 2 and 3 in a tight loop until the kernel hangs. In our
experiments, we’re able to reproduce this within a range of 40 - 480
seconds with an average of 254 seconds.
We wrote a script and helper C program to reproduce the bug
(Makefile and driver.c).
Fix:
In the try-sleeping loop, the kthread is not scheduled out if kthread_should_stop() evaluates to true.
prepare_to_wait(&sbi->snapshot_cleaner_wait, &wait, TASK_INTERRUPTIBLE);
if (!kthread_should_stop())
schedule();
finish_wait(&sbi->snapshot_cleaner_wait, &wait);
This fix follows standard practices found in other linux filesystems like UBIFS and NFS.
The patch linked fixes this bug. We ran the same scripts above for 10
million times and 17 hours, and the bug did not trigger. The bug was
discovered using a new tool for finding f/s bugs using model checking,
called Metis.
When nova is unmounted, the snapshot cleaner kthread is stopped with
kthread_stop()
innova_save_snapshots()
. Ifschedule()
is called withinkthread_stop()
'swait_for_completion()
, the kthread will go to sleepforever waiting for an interrupt, resulting in a hang.
linux-nova/fs/nova/snapshot.c
Lines 1301 to 1306 in 976a4d1
linux-nova/fs/nova/snapshot.c
Lines 1319 to 1326 in 976a4d1
Reproduction:
Mount a fresh nova instance using the 'mount -t NOVA -o init' command
Unmount nova
Remount nova at the same mount point
Repeat steps 2 and 3 in a tight loop until the kernel hangs. In our
experiments, we’re able to reproduce this within a range of 40 - 480
seconds with an average of 254 seconds.
We wrote a script and helper C program to reproduce the bug
(Makefile and driver.c).
Fix:
In the try-sleeping loop, the kthread is not scheduled out if
kthread_should_stop()
evaluates to true.This fix follows standard practices found in other linux filesystems like
UBIFS and NFS.
The patch linked fixes this bug. We ran the same scripts above for 10
million times and 17 hours, and the bug did not trigger. The bug was
discovered using a new tool for finding f/s bugs using model checking,
called Metis.
Signed-off-by: Gautam Ahuja <gaahuja@cs.stonybrook.edu>
Signed-off-by: Yifei Liu <yifeliu@cs.stonybrook.edu>
Signed-off-by: Erez Zadok <ezk@cs.stonybrook.edu>
The text was updated successfully, but these errors were encountered: