Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix stack overflow that occurred with deeply nested directories #28

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Arcterus
Copy link

This should fix #27. I basically removed rayon and handled the threads manually using crossbeam. I also took the liberty to make the diskus binary use the diskus library instead of just moding walk.rs.

@Arcterus
Copy link
Author

By the way, the only reason I used Cow for storing the directory entries was to avoid changing the public API.

@Arcterus
Copy link
Author

Also, this is only sort of related to this PR, but on my relatively old MacBook Pro, using diskus's default number of threads is significantly slower than the number reported by num_cpus::get() (like more than 2x slower) on the directory that stack overflowed before. It's still faster than GNU du though.

@sharkdp
Copy link
Owner

sharkdp commented May 28, 2019

Thank you very much for your contribution. I did not have time to look into this, but I am planning to do so soon.

@Arcterus
Copy link
Author

No problem, take your time.

@sharkdp
Copy link
Owner

sharkdp commented May 31, 2019

I finally had some time to look into this. First of all, thanks again for your PR!

This should fix #27.

Why do you say "should"? Are you able to reproduce the originally filed issue?

I tried to trigger the underlying issue today with this small script:

#!/bin/bash

set -e

base="/tmp/deeply_nested"
rm -rf "$base"

path="$base"

for _ in {1..2034}; do
    path="$path/a"
done

mkdir -p "$path"
dd if=/dev/zero of="$path/file_1MB" bs=1024 count=1024

(
    cd "$base"
    du -sb
    diskus
)

It creates a very deeply nested directory /tmp/deeply_nested/a/a/a/a/a/…/a/ and puts a 1MB-sized file in it. Up to a depth of 2034, both du and diskus report the same aggregated size without any errors. At depth 2035, the path reaches the 4096 character limit on my filesystem and dd fails to create the 1MB file (File name too long).

I could try to find other filesystems without this limit or try to decrease the stack size of diskus, but I'm not sure if it's worth the effort. Is the stack overflow really a realistic scenario?

By the way, the only reason I used Cow for storing the directory entries was to avoid changing the public API.

Thank you, but I don't think we have to worry about API stability for now. diskus is first and foremost a command-line utility. The additional library is a bonus (for now).

Also, this is only sort of related to this PR, but on my relatively old MacBook Pro, using diskus's default number of threads is significantly slower than the number reported by num_cpus::get() (like more than 2x slower) on the directory that stack overflowed before.

Hm 🙁… I was afraid that the measurements on my machine would not extrapolate to other scenarios. It would be great to get some results from your laptop (only if you find the time). Using hyperfine, you can run:

hyperfine \
  --warmup 2 \
  --export-csv "diskus_benchmark.csv" \
  --export-markdown "diskus_benchmark.md" \
  --parameter-scan j 1 20 \
  "diskus -j {j}"

Concerning your PR, I did run some benchmarks on my machine and it looks like the crossbeam implementation is slightly slower (15% on a warm cache, 5% on a cold cache) than the current rayon implementation. At least with the default thread number.

@Arcterus
Copy link
Author

Why do you say "should"?

I basically meant barring any bugs that I did not catch.

I could try to find other filesystems without this limit or try to decrease the stack size of diskus, but I'm not sure if it's worth the effort. Is the stack overflow really a realistic scenario?

The issue occurred when running diskus on my home directory (which was the first directory I tested). There were probably a few different conditions that combined to cause the overflow, as diskus had to handle tons of files, deeply nested directories, some hard links (I think), etc. in some ways my home directory is basically a stress test...

Also, AFAIK, the 4096 character limit is usually not an actual limit. Linux (for example) can handle much larger paths, although it causes problems with some functions.

Thank you, but I don't think we have to worry about API stability for now.

Alright, I’ll remove Cow then.

It would be great to get some results from your laptop (only if you find the time).

I’ll try to test today or tomorrow. I’ll see if I can test it on another computer too.

@micwoj92
Copy link

micwoj92 commented Jul 8, 2022

This branch has conflicts that must be resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stack overflow
3 participants