Restore Performance #629
Replies: 3 comments 1 reply
-
@kalleyne Thanks for opening this discussion. Actually I don't know if I can confirm that CPU usage generally is a bottleneck when restoring. Using rustic, restore was <3s and reported >600 MiB/s (#624 even increases restore rate). For comparison: |
Beta Was this translation helpful? Give feedback.
-
Hello @aawsome I think that our experience during the roughly week long restore somewhat matches the results you see with your test. Our impression is that LAN based disk I/O is probably the first bottleneck that users may encounter with restic or rustic restores. The performance of SSDs and NVMe disks are in most cases better than spinning disks. Still though, I think it's safe to state that Based upon some quick estimates from MinIO for example: https://min.io/docs/minio/linux/operations/checklists/hardware.html
Seeing that the But, yes, we are sure that there could be a next level bottleneck at the CPU level if disk I/O is at the speed of virtual memory or RAM disk and if we wanted to push restore performance to exceed throughput of 5-10Gbit/s....which would be desirable. Unfortunately for many of our customers there seemed to be a lot of pre-conditioning going on where rsync has been equated to backups. Multiple simultaneous rsync jobs done over SMB or even SSH can get close to saturating 10GbE. So while restic/rustic restores are doing a lot of CPU bound work and while restic/rustic repos bring to the table a huge list of positive qualities....most people are not comparing apples with apples and will only look at the final result and declare that "multiple rsync restores are faster" than any single use of restic/rustic restore. So yes, training and preparing staff is part of the answer. But we are not choosy and we will gladly accept any rustic performance improvements open for consideration. Thanks. |
Beta Was this translation helpful? Give feedback.
-
I think I can prepare a rustic version were
can be customized + where we get debugging information which part is waiting for which other part in this processing pipeline. But in order to do that I need some time, so this could take a couple of days. About the comment of @wscott: In fact the situation is even worse: rustic does quite randomly pick some chunks from the backend, decrypts and decompresses them and then writes them to every destination where this chunk is used. That is, the data is very randomly+parallel written to the restore destination. I also thought to bring this random access into some order such that the probability of writing sequentially to a file on the destination increases a lot. Actually #624 is some preliminary work for that. But anyway I think we start with optimizing the throughput (which includes finetuning the number of parallel writes) and then do some optimization of local writes in a second step. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
This is a placeholder.
Original thread: https://forum.restic.net/t/restic-restore-showing-a-sharp-reduction-of-rx-receive-speed-over-time/6199
Beta Was this translation helpful? Give feedback.
All reactions