Lessons learned from optimizing zxfer #941
Replies: 1 comment 1 reply
-
Most of them are, with the caveat that we're extremely serious about safety--you called out "the documentation says syncoid waits to destroy until after successful replication" and that is both correct and not something we're ever going to change. Aside from that, parallelization isn't necessarily a terrible idea, although it can add a lot of code complexity and potential for race conditions that we just don't have as things stand. In particular, batching up zfs destroy operations with a comma separated list is an improvement I've been low-key intending to make for quite some time... but it wouldn't hurt my feelings a bit if you submitted PRs instead. The one thing I'd ask is that you do small PRs for the individual optimizations, rather than lobbing a giant PR that does tons of things over the wall in one go. Thanks for your interest, and for patching up zxfer! I know that project has been hurting for maintenance for a while. =) |
Beta Was this translation helpful? Give feedback.
-
Hello!
I echo the kudos that I've seen in these discussions. These are great tools, humbly doing important daily work!
For years, I've been using
zxfer
on FreeBSD. Earlier in the year, I started seeing very poor performance when replicating multiple hosts into multiple backup servers, each with dozens of datasets due to hosting several ZFS backed jails. Due to the slow performance and occasional lockups, I peeked under the hood to see whatzxfer
was doing. There was no compression andgrep
s were running at O(n^2) levels! Happy to see thatsyncoid
not only implements compression but has several options to choose from. Once I started digging, I ended up refactoring most ofzxfer
's replication specific code while maintaining it's use of/bin/sh
. In the process of optimization, I tried various techniques and learned a few things which may I thought to share. Your tool is already much faster for usingperl
in many use cases. No need to spawn thousands ofgrep
s! Since I'm working with hundreds of datasets, several hosts, and thousands of snapshots, there are use cases where my fork exceeds the performance ofsyncoid
and I'd love to seesyncoid
implement some of the optimizations if it's within your design vision.- Parallelization
zfs list -t snap -s creation
is slow and a bottleneck especially when there are several snapshots as ZFS has to search the metadata. I noticedsyncoid
executes this as it iterates through each dataset. While waiting for this process to finish, it is possible to do some background work such as performing deletions. The documentation states that deletions are only performed after a successful send. Myzxfer
fork runs allzfs destroy
operations as background processes and doesn't wait for them to complete (the snapshots are batched within eachzfs destroy
command).In general, anytime operations are independently performed on the source and host, it's helpful to do so asynchronously and wait for the longest process.
syncoid
concurrent transfers would be good use of time while waiting forzfs list -s creation
to finish.The number of jobs that can run concurrently depends on the number of disks in a pool, cpu cores, and storage type (SSD vs HD) and is probably best if user supplied.-Deletion batching
zfs destroy
s need to spawn.- More Compression (how about that Hutter prize?)
syncoid
already does a great job of this. I also compress the output ofzfs list
commands on remote hosts. When there are thousands of snapshots, the output can be several megabytes and it is highly compressible.-Initial
zfs list
caching and selective iterationzfs list -o name -t snapshot
is executed concurrently on the host and target at the beginning of execution (without sorting by creation time) prior to getting a list of source snapshots ordered by creation time, the datasets can be compared and only those datasets that don't have matching snapshots need to be iterated over. This saves significant time when performing a replication after asyncoid
run has just finished.zfs list -t snap -s creation
at the beginning of the run loop, this can be sped up tremendously by usingparallel
to combine non-mangled output of each dataset into one file. (I know that listing snapshots for all datasets at the beginning differs fromsyncoid
s implementation).-Repeating replication until there are no changes (Yield)
I wanted to share some of the real-world optimizations that made a difference in overall run time and see if any of are interest to this project.
In the end, my
zxfer
fork replication times dropped from literal hours, to seconds.syncoid
is already much faster than the originalzxfer
due to its support of compression, mbuffer, etc.Thanks for reading!
Aldo
https://github.com/totalAldo/zxfer
Beta Was this translation helpful? Give feedback.
All reactions