Skip to content

Commit

Permalink
Improve benchmark script (#458)
Browse files Browse the repository at this point in the history
* Run throughput benchmark multiple times

Signed-off-by: Monthon Klongklaew <monthonk@amazon.com>

* Update name for sequential write direct io job

Signed-off-by: Monthon Klongklaew <monthonk@amazon.com>

* Update benchmark doc

Signed-off-by: Monthon Klongklaew <monthonk@amazon.com>

* Update config for write benchmarks

Signed-off-by: Monthon Klongklaew <monthonk@amazon.com>

---------

Signed-off-by: Monthon Klongklaew <monthonk@amazon.com>
  • Loading branch information
monthonk authored Aug 16, 2023
1 parent 35d23e9 commit d74c745
Show file tree
Hide file tree
Showing 4 changed files with 96 additions and 109 deletions.
2 changes: 1 addition & 1 deletion doc/BENCHMARKING.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ In general, we run each IO operation for 30 seconds against a 100 GiB file. But

***readdir workload*** - we measure how long it takes to run `ls` command against directories with different size. Each directory has no subdirectory and contains a specific number of files, range from 100 to 100000 files, which we have to create manually using fio then upload them to S3 bucket before running the benchmark. The fio configuration files for creating them can be found at path [mountpoint-s3/scripts/fio/create/](../mountpoint-s3/scripts/fio/create).

***write workload*** - we measure write throughput by using [dd](https://man7.org/linux/man-pages/man1/dd.1.html) command to simulate sequential write workloads. We plan to use fio in the future for consistency with other benchmarks but its current write pattern is not supported by Mountpoint. Firstly, fio creates a file with 0 byte and close it. Secondly, fio opens the file again with `O_RDWR` flag to do the IO workloads. To support fio, Mountpoint has to allow file overwrites and allow file opens with `O_RDWR` flag.
***write workload*** - we measure write throughput by using fio to simulate sequential write workloads. The fio configuration files for write workloads can be found at path [mountpoint-s3/scripts/fio/write/](../mountpoint-s3/scripts/fio/write).

### Regression Testing
Our CI runs the benchmark automatically for any new commits to the main branch or specific pull requests that we have reviewed and tagged with **performance** label. Every benchmark from the CI workflow will be running on `m5n.24xlarge` EC2 instances (100 Gbps network speed) with Ubuntu 22.04 in us-east-1 against a bucket in us-east-1.
Expand Down
16 changes: 16 additions & 0 deletions mountpoint-s3/scripts/fio/write/seq_write.fio
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
[global]
name=fs_bench
bs=256k
runtime=30s
time_based
group_reporting

[sequential_write]
size=100G
rw=write
ioengine=sync
fallocate=none
create_on_open=1
fsync_on_close=1
unlink=1
startdelay=30s
17 changes: 17 additions & 0 deletions mountpoint-s3/scripts/fio/write/seq_write_direct.fio
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
[global]
name=fs_bench
bs=256k
runtime=30s
time_based
group_reporting

[sequential_write_direct_io]
size=100G
rw=write
ioengine=sync
direct=1
fallocate=none
create_on_open=1
fsync_on_close=1
unlink=1
startdelay=30s
170 changes: 62 additions & 108 deletions mountpoint-s3/scripts/fs_bench.sh
Original file line number Diff line number Diff line change
Expand Up @@ -33,17 +33,48 @@ results_dir=results
runtime_seconds=30
startdelay_seconds=30
max_threads=4
iteration=10

rm -rf ${results_dir}
mkdir -p ${results_dir}

run_fio_job() {
job_file=$1
bench_file=$2
mount_dir=$3

job_name=$(basename "${job_file}")
job_name="${job_name%.*}"

for i in $(seq 1 $iteration);
do
fio --thread \
--output=${results_dir}/${job_name}_${i}.json \
--output-format=json \
--directory=${mount_dir} \
--filename=${bench_file} \
${job_file}
done

# combine the results and find an average value
jq -n 'reduce inputs.jobs[] as $job (null; .name = $job.jobname | .len += 1 | .value += (if ($job."job options".rw == "read")
then $job.read.bw / 1024
elif ($job."job options".rw == "randread") then $job.read.bw / 1024
elif ($job."job options".rw == "randwrite") then $job.write.bw / 1024
else $job.write.bw / 1024 end)) | {name: .name, value: (.value / .len), unit: "MiB/s"}' ${results_dir}/${job_name}_*.json | tee ${results_dir}/${job_name}_parsed.json

# delete the raw output files
for i in $(seq 1 $iteration);
do
rm ${results_dir}/${job_name}_${i}.json
done
}

read_bechmark () {
jobs_dir=mountpoint-s3/scripts/fio/read

for job_file in "${jobs_dir}"/*.fio; do
mount_dir=$(mktemp -d /tmp/fio-XXXXXXXXXXXX)
job_name=$(basename "${job_file}")
job_name="${job_name%.*}"

echo "Running ${job_name}"

Expand All @@ -65,122 +96,45 @@ read_bechmark () {
bench_file=${S3_BUCKET_SMALL_BENCH_FILE}
fi

# run benchmark
fio --thread \
--output=${results_dir}/${job_name}.json \
--output-format=json \
--directory=${mount_dir} \
--filename=${bench_file} \
${job_file}
# run the benchmark
run_fio_job $job_file $bench_file $mount_dir

# unmount file system
sudo umount ${mount_dir}

# cleanup mount directory
rm -rf ${mount_dir}

# parse result
jq -n 'inputs.jobs[] | if (."job options".rw == "read")
then {name: .jobname, value: (.read.bw / 1024), unit: "MiB/s"}
elif (."job options".rw == "randread") then {name: .jobname, value: (.read.bw / 1024), unit: "MiB/s"}
elif (."job options".rw == "randwrite") then {name: .jobname, value: (.write.bw / 1024), unit: "MiB/s"}
else {name: .jobname, value: (.write.bw / 1024), unit: "MiB/s"} end' ${results_dir}/${job_name}.json | tee ${results_dir}/${job_name}_parsed.json

# delete the raw output file
rm ${results_dir}/${job_name}.json
done
}

write_benchmark () {
# mount file system
mount_dir=$(mktemp -d /tmp/fio-XXXXXXXXXXXX)
cargo run --release ${S3_BUCKET_NAME} ${mount_dir} \
--allow-delete \
--prefix=${S3_BUCKET_TEST_PREFIX} \
--max-threads=${max_threads}
mount_status=$?
if [ $mount_status -ne 0 ]; then
echo "Failed to mount file system"
exit 1
fi
sleep $startdelay_seconds

## sequential write
job_name="sequential_write"
bench_file=${mount_dir}/${job_name}_${RANDOM}.dat
dd if=/dev/zero of=$bench_file bs=256k conv=fsync > ${results_dir}/${job_name}.txt 2>&1 &
# get the process ID
dd_pid=$!

sleep $runtime_seconds
# send USR1 signal to print the result
kill -USR1 ${dd_pid}
sleep 0.1
kill ${dd_pid}

throughput_value=$(awk '/copied/ {print $10}' ${results_dir}/${job_name}.txt)
unit=$(awk '/copied/ {print $11}' ${results_dir}/${job_name}.txt)
# convert unit to MiB/s
case "$unit" in
GB/s)
throughput_value=$(awk "BEGIN {print $throughput_value*1000*1000*1000/1024/1024}")
;;
MB/s)
throughput_value=$(awk "BEGIN {print $throughput_value*1000*1000/1024/1024}")
;;
kB/s)
throughput_value=$(awk "BEGIN {print $throughput_value*1000/1024/1024}")
;;
esac

json_data="{\"name\":\"$job_name\",\"value\":$throughput_value,\"unit\":\"MiB/s\"}"
echo $json_data | jq '.' | tee ${results_dir}/${job_name}.json

# clean up the data file and the raw output file
sleep 10
rm $bench_file ${results_dir}/${job_name}.txt


## sequential write with direct IO
job_name="sequential_write_direct_io"
bench_file=${mount_dir}/${job_name}_${RANDOM}.dat
dd if=/dev/zero of=$bench_file bs=256k oflag=direct conv=fsync > ${results_dir}/${job_name}.txt 2>&1 &
# get the process ID
dd_pid=$!

sleep $runtime_seconds
# send USR1 signal to print the result
kill -USR1 ${dd_pid}
sleep 0.1
kill ${dd_pid}

throughput_value=$(awk '/copied/ {print $10}' ${results_dir}/${job_name}.txt)
unit=$(awk '/copied/ {print $11}' ${results_dir}/${job_name}.txt)
# convert unit to MiB/s
case "$unit" in
GB/s)
throughput_value=$(awk "BEGIN {print $throughput_value*1000*1000*1000/1024/1024}")
;;
MB/s)
throughput_value=$(awk "BEGIN {print $throughput_value*1000*1000/1024/1024}")
;;
kB/s)
throughput_value=$(awk "BEGIN {print $throughput_value*1000/1024/1024}")
;;
esac

json_data="{\"name\":\"$job_name\",\"value\":$throughput_value,\"unit\":\"MiB/s\"}"
echo $json_data | jq '.' | tee ${results_dir}/${job_name}.json

# clean up the data file and the raw output file
sleep 10
rm $bench_file ${results_dir}/${job_name}.txt

# unmount file system
sudo umount ${mount_dir}

# cleanup mount directory
rm -rf ${mount_dir}
jobs_dir=mountpoint-s3/scripts/fio/write

for job_file in "${jobs_dir}"/*.fio; do
# mount file system
mount_dir=$(mktemp -d /tmp/fio-XXXXXXXXXXXX)
cargo run --release ${S3_BUCKET_NAME} ${mount_dir} \
--allow-delete \
--prefix=${S3_BUCKET_TEST_PREFIX} \
--max-threads=${max_threads}
mount_status=$?
if [ $mount_status -ne 0 ]; then
echo "Failed to mount file system"
exit 1
fi

# set bench file
bench_file=${mount_dir}/${job_name}_${RANDOM}.dat

# run the benchmark
run_fio_job $job_file $bench_file $mount_dir

# unmount file system
sudo umount ${mount_dir}

# cleanup mount directory
rm -rf ${mount_dir}
done
}

read_bechmark
Expand Down

0 comments on commit d74c745

Please sign in to comment.