Skip to content

overcompressible data with blockvarpct #98

@jayrajput0

Description

@jayrajput0

Hello Sven,

Hope you are doing well!

I am running a benchmark on a ceph cluster, my goal is to evaluate compression overhead with object workload.

  • I am using --blockvarpct 50 with elbencho to have compressible data, zlib compression algorithm is configured on my pools.
  • The data getting over compressed, see the default.rgw.buckets.data pool row, the last two columns.

Explanation for last two columns:

  • USED COMPR: The amount of space allocated for compressed data including his includes compressed data, allocation, replication and erasure coding overhead.
  • UNDER COMPR: The amount of data passed through compression and beneficial enough to be stored in a compressed form.
# ceph df detail
--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
ssd    503 TiB  495 TiB  7.5 TiB   7.5 TiB       1.50
TOTAL  503 TiB  495 TiB  7.5 TiB   7.5 TiB       1.50

--- POOLS ---
POOL                       ID   PGS   STORED   (DATA)   (OMAP)  OBJECTS     USED   (DATA)   (OMAP)  %USED  MAX AVAIL  QUOTA OBJECTS  QUOTA BYTES  DIRTY  USED COMPR  UNDER COMPR
.mgr                        1     1   13 MiB   13 MiB      0 B        6   38 MiB   38 MiB      0 B      0    156 TiB            N/A          N/A    N/A         0 B          0 B
default.rgw.buckets.index  85   256  271 MiB      0 B  271 MiB      415  814 MiB      0 B  814 MiB      0    156 TiB            N/A          N/A    N/A         0 B          0 B
default.rgw.meta           86    32   42 KiB  4.2 KiB   38 KiB       19  306 KiB  192 KiB  114 KiB      0    156 TiB            N/A          N/A    N/A         0 B          0 B
default.rgw.log            87    32   28 KiB   28 KiB      0 B      323  2.3 MiB  2.3 MiB      0 B      0    156 TiB            N/A          N/A    N/A         0 B          0 B
default.rgw.control        88    32      0 B      0 B      0 B        9      0 B      0 B      0 B      0    156 TiB            N/A          N/A    N/A         0 B          0 B
.rgw.root                  89    32   11 KiB   11 KiB      0 B       19  216 KiB  216 KiB      0 B      0    156 TiB            N/A          N/A    N/A         0 B          0 B
default.rgw.buckets.data   90  4096  4.5 TiB  4.5 TiB      0 B    1.53M  7.2 TiB  7.2 TiB      0 B   1.52    156 TiB            N/A          N/A    N/A     469 GiB      6.8 TiB
  • I am using below syntax for benchmarking.
 elbencho --hosts $ebHosts --numhosts $numhost --s3endpoints $ebEndpoints --s3key $access --s3secret $secret -r -s $size -b $chunksize -t $threads -n $numhost -N $objects --timelimit $testduration --infloop --nolive --lat --latpercent --latpercent9s 0 --direct --csvfile=pgd-${size}-${threads}t.csv --resfile=pgd-${size}-${threads}t.out --port 13001 --label ${threads}t-$comment --s3ignoreerrors --treescan s3://${size}-${threads}t-${comment}-bucket ${additionalargs} ${size}-${threads}t-${comment}-bucket &>> elbencho-${size}-${threads}t.out
  • Do you think it's due to the data buffer pattern? any feedback or suggestions to avoid this over compression?

  • Just for the sanity check I used fio (obviously for block workload) with buffer_compress_percentage=50 to see how the results looks like, and I see almost the 50% of compression there.

  • Point to note that fio enables refill_buffers by default with buffer_compress_percentage to avoid likelihood of over compression.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions