-
Notifications
You must be signed in to change notification settings - Fork 30
Open
Description
Hello Sven,
Hope you are doing well!
I am running a benchmark on a ceph cluster, my goal is to evaluate compression overhead with object workload.
- I am using
--blockvarpct 50with elbencho to have compressible data, zlib compression algorithm is configured on my pools. - The data getting over compressed, see the
default.rgw.buckets.datapool row, the last two columns.
Explanation for last two columns:
- USED COMPR: The amount of space allocated for compressed data including his includes compressed data, allocation, replication and erasure coding overhead.
- UNDER COMPR: The amount of data passed through compression and beneficial enough to be stored in a compressed form.
# ceph df detail
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
ssd 503 TiB 495 TiB 7.5 TiB 7.5 TiB 1.50
TOTAL 503 TiB 495 TiB 7.5 TiB 7.5 TiB 1.50
--- POOLS ---
POOL ID PGS STORED (DATA) (OMAP) OBJECTS USED (DATA) (OMAP) %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR
.mgr 1 1 13 MiB 13 MiB 0 B 6 38 MiB 38 MiB 0 B 0 156 TiB N/A N/A N/A 0 B 0 B
default.rgw.buckets.index 85 256 271 MiB 0 B 271 MiB 415 814 MiB 0 B 814 MiB 0 156 TiB N/A N/A N/A 0 B 0 B
default.rgw.meta 86 32 42 KiB 4.2 KiB 38 KiB 19 306 KiB 192 KiB 114 KiB 0 156 TiB N/A N/A N/A 0 B 0 B
default.rgw.log 87 32 28 KiB 28 KiB 0 B 323 2.3 MiB 2.3 MiB 0 B 0 156 TiB N/A N/A N/A 0 B 0 B
default.rgw.control 88 32 0 B 0 B 0 B 9 0 B 0 B 0 B 0 156 TiB N/A N/A N/A 0 B 0 B
.rgw.root 89 32 11 KiB 11 KiB 0 B 19 216 KiB 216 KiB 0 B 0 156 TiB N/A N/A N/A 0 B 0 B
default.rgw.buckets.data 90 4096 4.5 TiB 4.5 TiB 0 B 1.53M 7.2 TiB 7.2 TiB 0 B 1.52 156 TiB N/A N/A N/A 469 GiB 6.8 TiB
- I am using below syntax for benchmarking.
elbencho --hosts $ebHosts --numhosts $numhost --s3endpoints $ebEndpoints --s3key $access --s3secret $secret -r -s $size -b $chunksize -t $threads -n $numhost -N $objects --timelimit $testduration --infloop --nolive --lat --latpercent --latpercent9s 0 --direct --csvfile=pgd-${size}-${threads}t.csv --resfile=pgd-${size}-${threads}t.out --port 13001 --label ${threads}t-$comment --s3ignoreerrors --treescan s3://${size}-${threads}t-${comment}-bucket ${additionalargs} ${size}-${threads}t-${comment}-bucket &>> elbencho-${size}-${threads}t.out
-
Do you think it's due to the data buffer pattern? any feedback or suggestions to avoid this over compression?
-
Just for the sanity check I used fio (obviously for block workload) with
buffer_compress_percentage=50to see how the results looks like, and I see almost the 50% of compression there. -
Point to note that fio enables
refill_buffersby default withbuffer_compress_percentageto avoid likelihood of over compression.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels