Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] performance regression in trivial searches between 8.4 and 8.5 persisting in 8.7 #1362

Closed
1 task done
moshelooks opened this issue Aug 4, 2023 · 5 comments · Fixed by #1422
Closed
1 task done
Labels

Comments

@moshelooks
Copy link

Checks

  • I have read the troubleshooting section and still think this is a bug.

Describe the bug you encountered:

It takes nearly 10x longer to search an empty directory with fd 8.7 or 8.5 versus 8.4

> mkdir empty
> hyperfine -N 'fd-v8.4.0-x86_64-unknown-linux-gnu/fd . empty/' 'fd-v8.5.0-x86_64-unknown-linux-gnu/fd . empty/' 'fd-v8.7.0-x86_64-unknown-linux-gnu/fd . empty/'
Benchmark 1: fd-v8.4.0-x86_64-unknown-linux-gnu/fd . empty/
  Time (mean ± σ):       4.0 ms ±   0.4 ms    [User: 2.2 ms, System: 2.6 ms]
  Range (min … max):     3.3 ms …   5.9 ms    594 runs

Benchmark 2: fd-v8.5.0-x86_64-unknown-linux-gnu/fd . empty/
  Time (mean ± σ):      39.2 ms ±   2.3 ms    [User: 3.9 ms, System: 36.2 ms]
  Range (min … max):    36.8 ms …  56.7 ms    79 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 3: fd-v8.7.0-x86_64-unknown-linux-gnu/fd . empty/
  Time (mean ± σ):      38.4 ms ±   2.9 ms    [User: 3.5 ms, System: 35.9 ms]
  Range (min … max):    35.5 ms …  55.3 ms    81 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  'fd-v8.4.0-x86_64-unknown-linux-gnu/fd . empty/' ran
    9.68 ± 1.12 times faster than 'fd-v8.7.0-x86_64-unknown-linux-gnu/fd . empty/'
    9.88 ± 1.05 times faster than 'fd-v8.5.0-x86_64-unknown-linux-gnu/fd . empty/'

FWIW this is not an issue with startup time per se; running fd --version does not show any performance drop.

Using -F "" instead of . for the pattern show the same performance regression, if that's helpful to know.

I am happy to run any additional experiments that might be useful, but I'm afraid that my rust-fu is too weak to be of any use for deeper analysis or proposing a fix.

Describe what you expected to happen:

No response

What version of fd are you using?

fd 8.4.0 fd 8.5.0 fd 8.7.0

Which operating system / distribution are you on?

> uname -srm
Linux 5.15.0-76-generic x86_64
> lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.2 LTS
Release:	22.04
Codename:	jammy
@moshelooks moshelooks added the bug label Aug 4, 2023
@tavianator
Copy link
Collaborator

Is there any slowdown for non-empty directories? Time to search an empty directory is not really something we optimize for, as long as it's still reasonably fast.

I did do a quick perf diff and the difference appears to be caused by the switch to a bounded-capacity channel: 5278405. Looks like the overhead is just from initializing the memory allocated for the channel.

@moshelooks
Copy link
Author

moshelooks commented Aug 9, 2023

Right, what I am reporting here is not about empty directories per se, it is about initialization overhead. I ran a simple test of searching for . in a directory with N empty files in it for N increasing by powers of 10. Up through 10000 files I see basically a constant slowdown of 30 ms e.g. for N=1 it takes ~ 3 ms on 8.4 and ~ 33 ms on later versions, and for N=1000 it takes ~15 ms for 8.3 and 45 ms for later versions.

With N=10,000 what I suppose are some under-the-hood asymptotic improvements start to become noticeable. For N=100,000 the gap closes to 20 ms (100 ms vs. 120 ms) and for N=1,000,000 the asymptotic effect dominates and we see what we would always like to:

  'fd-v8.7.0-x86_64-unknown-linux-gnu/fd . /tmp/bench/1000000-files/' ran
    1.10 ± 0.03 times faster than 'fd-v8.5.0-x86_64-unknown-linux-gnu/fd . /tmp/bench/1000000-files/'
    1.20 ± 0.03 times faster than 'fd-v8.4.0-x86_64-unknown-linux-gnu/fd . /tmp/bench/1000000-files/'

Looking at the linked PR suggests to me that the effect out to be linear in --threads, which it basically is. So the 10x slowdown for trivial searches that I reported is an artifact of my machine having nproc=16. But with --threads=128 (not implausible!) we see

  'fd-v8.4.0-x86_64-unknown-linux-gnu/fd --threads=128 . /tmp/bench/1-files/' ran
   35.42 ± 4.09 times faster than 'fd-v8.7.0-x86_64-unknown-linux-gnu/fd --threads=128 . /tmp/bench/1-files/'

and the respective wall-clock times are 5.5 ms vs. 193.4 ms which is IMO worth optimizing for if not too hard to fix.

I'm not a rust guy, but I am rather curious why it takes so long to initialize memory; even with nproc=128 we're only talking about 2 megabytes, right? Should be ~20x faster.. Maybe a rabbit hole!

@moshelooks
Copy link
Author

oh whoops I just realized that capacity is 2M messages and presumably we are allocation rather more than 1 byte per message, duh...

@tavianator
Copy link
Collaborator

Note that there is also linear overhead just to start N threads, which is significant at -j128. See also #1203.

Also I suspect the overhead is actually more from syscall/VM locking overhead for mmap() than initialization, now that I think about it.

@moshelooks
Copy link
Author

Ah good catch, I suspect that this issue is probably a duplicate of #1203 since it looks like that guy is based on timings from 8.4. I certainly won't disagree that starting N threads has linear overhead however this really ought to be negligible for N=128. Please note that in my last comparison of 8.3 (5.5 ms) vs. 8.7 (193.4 ms) I used --threads=128 for both versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants