-
-
Notifications
You must be signed in to change notification settings - Fork 835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark is unfair. find(1) should not be used for grepping. #1395
Comments
Unfortunately, the regex implementation in GNU
This is on a checkout of rust-lang/rust (see the
I agree that the benchmarks should probably be updated, and we should make the comparisons as fair as possible. #893 is relevant. |
Relevant stuff: I understand that people use them out of habit, but the design of the program is rather dubious. But yeah, I could change the "should not be used" to maybe "are suboptimal, both in performance terms, and in simplicity". Piping to grep(1), you get faster results, and you don't even need to check the find(1) manual page for all the similar but different options that it has for grepping files.
The pipe shouldn't be a bottleneck. The bottleneck is usually I/O. $ time find >/dev/null
real 0m0.327s
user 0m0.091s
sys 0m0.233s
$ time find | grep -i '[0-9]\.c$' >/dev/null
real 0m0.335s
user 0m0.098s
sys 0m0.250s You can see that piping all filenames only adds a little bit of time to a simple find(1). Also, not only the real time is important. fdfind(1), with the appropriate patches and optimizations may beat alx@debian:~$ time find | grep -i '[0-9]\.c$' >/dev/null
real 0m0.358s
user 0m0.098s
sys 0m0.276s
alx@debian:~$ time fdfind -u | grep -i '[0-9]\.c$' >/dev/null
real 0m0.390s
user 0m1.831s
sys 0m6.067s
alx@debian:~$ time fdfind -u '[0-9]\.c$' >/dev/null
real 0m0.385s
user 0m1.846s
sys 0m6.222s That's fine if the bottleneck is in find(1), but if I pipe this command to a more consuming pipeline where find(1) is not the bottleneck, I fear that it may actually be slower, and will occupy the CPU that could be running other tasks. If I have other heavy tasks at the same time, like compiling some software, it will also probably affect the performance of fdfind(1) significantly, while
Those filenames are already broken (they are unportable, according to POSIX https://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap04.html#tag_04_06). Please don't use them. :) If you need them, though, you can use
Thanks. |
Thanks for this link, very cool!
This is often true, but in my personal uses the whole tree is usually in cache (dcache, or at least buffer cache). Most of the overhead then comes from kernel and syscall overhead.
That benchmark still has
Anyway, more important than things like
Agreed.
|
:-)
A 7% lost on the pipe seems quite reasonable. How much does fdfind(1) take on, say,
Yup, those things must run under find(1), but they are actually fast, aren't they? The problem with find(1)'s performance, AFAIK, is just with name filtering. Do you have benchmarks for those things comparing to find(1)?
Interesting; |
Unless you have an incredibly lare number of executables installed, /usr/bin won't be very large, relatively speaking. It is also pretty flat, which I think hinders the parallelizability. |
Well, its near 3.000 executables. |
That's a small thing, actually. Compare to this: alx@debian:~$ time find ~ | wc -l
660487
real 0m0.338s
user 0m0.080s
sys 0m0.273s
alx@debian:~$ time find ~ -type f | wc -l
603577
real 0m0.344s
user 0m0.096s
sys 0m0.262s
alx@debian:~$ time find ~ -type d | wc -l
53914
real 0m0.314s
user 0m0.091s
sys 0m0.223s :) |
Hi,
I believe the benchmarks you provide compared to find(1) are unfair. find(1) should not be used for grepping; following the Unix principles, find(1) should just find, and grep(1) should be responsible for filtering the output of find(1).
If we pipe find(1) to grep(1), the performance is significantly faster than just using find(1):
find | grep
seems to be faster thanfdfind
in my own simple test:Can you please provide benchmarks against this pipeline in your readme?
What version of
fd
are you using?[paste the output of
fd --version
here]$ dpkg -l | grep fd-find ii fd-find 8.7.0-3+b1 amd64 Simple, fast and user-friendly alternative to find
Just for completeness, here's my CPU:
Thanks!
The text was updated successfully, but these errors were encountered: