A FUSE file system with an internal dedicated page cache that only flushes data if explicitly requested by the application. This is useful for simulating power failures and losing all unsynced data.
LazyFS was tested with ext4 (with the default mount options) as the underlying file system (FUSE backend), in both Debian 11 (bullseye) and Ubuntu 20.04 (focal) environment. It is C++17 compliant and requires the following packages to be installed:
CMake
andg++
:
sudo apt install g++ cmake
# The following versions are used during development:
# cmake: 3.16.3
# g++: 9.4.0
FUSE 3
:
sudo apt install libfuse3-dev libfuse3-3 fuse3
FUSE requires the option allow_other
as a startup argument so that other users can read and write files, besides the user owning the file system. For that, you must uncomment/add the following line on the configuration file /etc/fuse.conf
:
user_allow_other
Compile and install the caching library libpcache
, which will be attached to LazyFS:
cd libs/libpcache && ./build.sh && cd -
Finally, build lazyfs
:
cd lazyfs && ./build.sh && cd -
LazyFS uses a toml configuration file to set up the cache and a named pipe to append fault commands:
[faults]
fifo_path="/tmp/faults.fifo"
# fifo_path_completed="/tmp/faults_completed.fifo"
[cache]
apply_eviction=false
[cache.simple]
custom_size="0.5GB"
blocks_per_page=1
# [cache.manual]
# io_block_size=4096
# page_size=4096
# no_pages=10
[file system]
log_all_operations=false
logfile="/tmp/lazyfs.log"
[[injection]]
type="torn-seq"
op="write"
file="output.txt"
persist=[1,4]
occurrence=2
[[injection]]
type="torn-op"
file="output1.txt"
occurrence=5
parts=3 #or parts_bytes=[4096,3600,1260]
persist=[1,3]
[[injection]]
type="clear"
from="f1.txt"
timing="before"
op="fsync"
occurrence=6
crash=true
I recommend following the simple
cache configuration (indicating the cache size and using a similar configuration file as default.toml
), since it's currently the most tested schema in our experiments. Additionally, for the section [cache], you can specify the following:
-
apply_eviction: Whether the cache should behave like the real page cache, evicting pages when the cache fills to the maximum.
-
[cache.simple] or [cache.manual]: To setup the cache size and internal organization. For now, you could just follow the example using the custom_size in (Gb/Mb/Kb) and the number of blocks in each page (you can just leave 1 as default). For manual configurations, comment out the simple configuration and uncoment/change the example above to suit your needs.
Optionally, users can specify a set of predefined injection
faults before LazyFS starts running. These faults are introduced as additional features, namely:
- torn-seq: This fault type is used when a sequence of system calls, targeting a single file, is executed consecutively without an intervening
fsync
. In the example, during the second group of consecutive writes (the group number is defined by the parameteroccurrence
), to the file "output.txt", the first and fourth writes will be persisted to disk (the writes to be persisted are defined by the parameterpersist
). After the fourth write (the last in thepersist
vector), LazyFS will crash itself. - torn-op: This fault type involves dividing a write system call into smaller parts, with some of these parts being persisted while others are not. In the example, the fifth write issued (the number of the write is defined by the parameter
occurrence
) to the file "output1.txt" will be divided into three equal parts if theparts
parameter is used, or into customizable-sized parts if theparts_bytes
parameter is defined. In the commented code, there's an example of usingparts_bytes
, where the write will be torn into three parts: the first with 4096 bytes, the second with 3600 bytes, and the last with 1200 bytes. Thepersist
vector determines which parts will be persisted. After the persistence of these parts, LazyFS will crash. - clear-cache: Clears unsynced data in a certain point of the execution. In the example above, this fault will be injected after (
timing
) the sixth (occurrence
)fsync
(op
) to the file "f1.txt" (from
). Theop
parameter must be a system call, and if it involves two paths (such asrename
), theto
parameter should also be specified. Thecrash
parameter determines whether LazyFS should crash after the fault injection.
Other parameters:
- fifo_path: The absolute path where the faults FIFO should be created.
- fifo_path_completed: If we plan to inject the clear cache fault synchronously, it is necessary to determine the completion of the
lazyfs::clear-cache
command execution. By specifying this parameter, a message will be written to another FIFO (finished::clear-cache
), so that users can set up a reader process that waits before making any post-fault consistency checks. - log_all_operations: Whether to log all file system operations that LazyFS receives.
- logfile: The log file for LazyFS's outputs. Fault acknowledgment is sent to
stdout
or to thelogfile
.
To run the file system, one could use the mount-lazyfs.sh script, which calls FUSE with the correct parameters:
cd lazyfs/
# Running LazyFS in the foreground (add '-f/--foregound')
./scripts/mount-lazyfs.sh -c config/default.toml -m /tmp/lazyfs.mnt -r /tmp/lazyfs.root -f
# Running LazyFS in the background
./scripts/mount-lazyfs.sh -c config/default.toml -m /tmp/lazyfs.mnt -r /tmp/lazyfs.root
# Umount with
./scripts/umount-lazyfs.sh -m /tmp/lazyfs.mnt/
# Display help
./scripts/mount-lazyfs.sh --help
./scripts/umount-lazyfs.sh --help
Finally, one can control LazyFS by echoing the following commands to the configured FIFO:
-
Clear cache - clears all unsynced data:
echo "lazyfs::clear-cache" > /tmp/faults.fifo
-
Checkpoint - checkpoints all unsynced data by calling
write
to the underlying file system (withoutfsync
):echo "lazyfs::cache-checkpoint" > /tmp/faults.fifo
Note: Any subsequent failure is outside of the test control.
-
Show usage - displays the current cache usage (percentage of pages allocated):
echo "lazyfs::display-cache-usage" > /tmp/faults.fifo
-
Report unsynced data, which displays the inodes that have data in cache:
echo "lazyfs::unsynced-data-report" > /my/path/faults.fifo
-
Kill the filesystem, which is triggered by an operation, a timing and a path regex:
Here timing should be one of
before
orafter
, and op should be a valid system call name (e.g.write
orread
).-
In the case of operations that have a source path only (e.g.
create
,open
,read
,write
, ...)echo "lazyfs::crash::timing=...::op=...::from_rgx=..." > /my/path/faults.fifo
Here,
from_rgx
is required (do not specify to_rgx). -
For
rename
,link
andsymlink
, one is able to specify the destination path:echo "lazyfs::crash::timing=...::op=...::from_rgx=...::to_rgx=..." > /my/path/faults.fifo
Here, only one of
from_rgx
orto_rgx
is required.
Example 1:
echo "lazyfs::crash::timing=before::op=write::from_rgx=file1" > /my/path/faults.fifo
Kills LazyFS before executing a write operation to the file pattern 'file1'.
Example 2:
echo "lazyfs::crash::timing=before::op=link::from_rgx=file1::to_rgx=file2" > /my/path/faults.fifo
Kills LazyFS before executing a rename operation from the file pattern 'file1' to the file pattern 'file2'.
Example 3:
echo "lazyfs::crash::timing=before::op=rename::to_rgx=fileabd" > /my/path/faults.fifo
Kills LazyFS before executing a link operation to the file pattern 'fileabd'.
-
-
Kill the filesystem after injecting
torn-op
ortorn-seq
faults:The parameters are the same as the ones presented in the above configuration file. Parameters that have multiple values, must be specified without the parenthesis (e.g.,
persist=1,2
).-
echo "lazyfs::torn-op::file=...::persist=...::parts=...::occurrence=..." > /my/path/faults.fifo
-
echo "lazyfs::torn-seq::op=...::file=...::persist=...::occurrence=..." > /my/path/faults.fifo
-
LazyFS expects that every buffer written to the FIFO file terminates with a new line character (echo does this by default). Thus, if using pwrite
, for example, make sure you end the buffer with \n
.
For additional information regarding possible improvements and collaborations please open an issue or contact: @devzizu, @mj-ramos and @dsrhaslab.