A command line tool to quickly generate a lot of files in a lot of directories. This tool creates an M-ary tree shaped directory tree and randomly places any number of files of any size within this tree. The distribution of files per directory is roughly equal. If a size is provided, the files will be filled with zeros up to that size.
- Download Binary
sudo wget https://github.com/joshuaboud/gen-dataset/releases/download/v1.3/gen-dataset -P /usr/local/bin
- Mark Executable
sudo chmod +x /usr/local/bin/gen-dataset
- Install Boost Development Libraries
- Get Source and Install
git clone https://github.com/joshuaboud/gen-dataset.git cd gen-dataset make -j8 sudo make install
usage:
gen-dataset -c [-b -d -s -S -t -w -y] [path]
flags:
-b, --branches <int> - number of subdirectories per directory
-c, --count <int> - total number of files to create
-d, --depth <int> - number of directory levels
-s, --size <float[K..T][i]B> - file size
-S, --buff-size <float[K..T][i]B> - write buffer size (default=1M)
-t, --threads <int> - number of parallel file creation threads
-w, --max-wait <float (seconds)> - max random wait between file creation
-y, --yes - don't prompt before creating files
Generate 10 1GiB files in a single subdirectory named 'subdir':
gen-dataset -c 10 -s 1GiB subdir
Generate 10,000 1M files in 3905 directories:
gen-dataset -d 5 -b 5 -c 10000 -s 1MiB
Simulate real usage by randomly waiting up to 2.5 seconds between file creations:
gen-dataset -d 4 -b 6 -c 1000 -s 1MiB -w 2.5
Generate 1,000,000 empty files in 55986 directories with 16 threads writing the files:
gen-dataset -d 6 -b 6 -c 1000000 -t 16