overcompressible data with blockvarpct

Hello Sven,

Hope you are doing well!

I am running a benchmark on a ceph cluster, my goal is to evaluate compression overhead with object workload.
- I am using `--blockvarpct 50`  with elbencho to have compressible data,  zlib compression algorithm is configured on my pools.
- The data getting over compressed,  see the `default.rgw.buckets.data` pool row, the last two columns.

Explanation for last two columns:
- USED COMPR: The amount of space allocated for compressed data including his includes compressed data, allocation, replication and erasure coding overhead.
- UNDER COMPR: The amount of data passed through compression and beneficial enough to be stored in a compressed form.

~~~
# ceph df detail
--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
ssd    503 TiB  495 TiB  7.5 TiB   7.5 TiB       1.50
TOTAL  503 TiB  495 TiB  7.5 TiB   7.5 TiB       1.50

--- POOLS ---
POOL                       ID   PGS   STORED   (DATA)   (OMAP)  OBJECTS     USED   (DATA)   (OMAP)  %USED  MAX AVAIL  QUOTA OBJECTS  QUOTA BYTES  DIRTY  USED COMPR  UNDER COMPR
.mgr                        1     1   13 MiB   13 MiB      0 B        6   38 MiB   38 MiB      0 B      0    156 TiB            N/A          N/A    N/A         0 B          0 B
default.rgw.buckets.index  85   256  271 MiB      0 B  271 MiB      415  814 MiB      0 B  814 MiB      0    156 TiB            N/A          N/A    N/A         0 B          0 B
default.rgw.meta           86    32   42 KiB  4.2 KiB   38 KiB       19  306 KiB  192 KiB  114 KiB      0    156 TiB            N/A          N/A    N/A         0 B          0 B
default.rgw.log            87    32   28 KiB   28 KiB      0 B      323  2.3 MiB  2.3 MiB      0 B      0    156 TiB            N/A          N/A    N/A         0 B          0 B
default.rgw.control        88    32      0 B      0 B      0 B        9      0 B      0 B      0 B      0    156 TiB            N/A          N/A    N/A         0 B          0 B
.rgw.root                  89    32   11 KiB   11 KiB      0 B       19  216 KiB  216 KiB      0 B      0    156 TiB            N/A          N/A    N/A         0 B          0 B
default.rgw.buckets.data   90  4096  4.5 TiB  4.5 TiB      0 B    1.53M  7.2 TiB  7.2 TiB      0 B   1.52    156 TiB            N/A          N/A    N/A     469 GiB      6.8 TiB
~~~

- I am using below syntax for benchmarking. 

~~~
 elbencho --hosts $ebHosts --numhosts $numhost --s3endpoints $ebEndpoints --s3key $access --s3secret $secret -r -s $size -b $chunksize -t $threads -n $numhost -N $objects --timelimit $testduration --infloop --nolive --lat --latpercent --latpercent9s 0 --direct --csvfile=pgd-${size}-${threads}t.csv --resfile=pgd-${size}-${threads}t.out --port 13001 --label ${threads}t-$comment --s3ignoreerrors --treescan s3://${size}-${threads}t-${comment}-bucket ${additionalargs} ${size}-${threads}t-${comment}-bucket &>> elbencho-${size}-${threads}t.out
~~~

- Do you think it's due to the data buffer pattern? any feedback or suggestions to avoid this over compression?

- Just for the sanity check I used fio (obviously for block workload) with `buffer_compress_percentage=50` to see how the results looks like, and I see almost the 50% of compression there.
- Point to note that fio enables `refill_buffers` by default with `buffer_compress_percentage` to avoid likelihood of over compression. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

overcompressible data with blockvarpct #98

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

overcompressible data with blockvarpct #98

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions