Skip to content

Higher GET latencies with --treescan #97

@jayrajput0

Description

@jayrajput0

Looks like I am not allowed to reopen the Issue therefore creating a new one with reference to it

hi @jayrajput0 ,
you might just want to use the "--treescan" option for the read/delete. this way, you can delete whatever number of objects you see fit in a time-limited write phase, for example like this to start creating a total of 6400 16MB-sized objects via 64 threads and stop after 4 seconds:

elbencho -s 16m -b 16m -t 64 -w mybucket --s3objprefix testprefix1/ -N 100 --timelimit 4 --s3endpoints ...

now for the reads we don't know if all the files have been created or not. for this, we can use the "--treescan" option. it will discover all the objects under a given path and then apply the given operation (e.g. read or delete) to the objects under the given bucket and s3 object prefix. in our example, the treescan bucket/prefix will be the same as the given bucket/prefix where we want to apply the operation.
so for timelimited reads it could look like this:

elbencho -s 16m -b 16m -t 64 -r mybucket --s3objprefix testprefix1/ --timelimit 4 --treescan s3://mybucket/testprefix1/ --s3endpoints ...

and for deletes you can use a command line simiar to the reads with the "--treescan", with or without timelimit, depending on whether you want to be sure that all objects get deleted in the end or not.

i'm closing this issue in the hope that this solution works for you. if not then of course please feel free to re-open this issue.

Originally posted by @breuner in #80

================================

Hello Sven,
Hope you are doing well!
Sorry I couldn't validate the suggested --treescan option last time due to time constraints.

Here is my GOAL:
I want to run a time bound PUT (write) test and it's really difficult to forecast the object count therefore I pass a very high object count along with --timelimit .

Issue:
The problem with above method is that GET starts failing with error Object download failed.

Suggested solution:
As you suggested to use --treescan, I tested the same but the latencies reported with --treescan is higher therefore it defeats the purpose.

I performed below Tests:

TEST SCENARIO 01:

  1. Uploading objects with timelimit 300sec and very high object count 99999999999 to make sure that test runs for 5 mins.
# elbencho --hosts xxxx --numhosts 3 --s3endpoints xxxxxxxx --s3key xxx--s3secret xxxxx-w -s 64k -b 64k -t 8 -n 3 -N 99999999999 --nolive --lat --latpercent --latpercent9s 0 --direct --s3nompcheck --s3objprefix X5D  --timelimit 300 --csvfile=put-3h.csv --resfile=put-3h.out --port 13001 01test1
OPERATION   RESULT TYPE         FIRST DONE   LAST DONE
=========== ================    ==========   =========
WRITE       Elapsed time     :    5m0.249s    5m0.254s
            Objects/s        :        5532        5538
            Throughput MiB/s :         345         346
            Total MiB        :      103812      103929
            Objects total    :     1660997     1662856
            Objects latency  : [ min=3.34ms avg=4.33ms max=1.25s ]
            Objects lat % us : [ 1%<=4096 50%<=4096 75%<=4871 99%<=6889 ]
            IO latency       : [ min=3.34ms avg=4.33ms max=1.25s ]
            IO lat % us      : [ 1%<=4096 50%<=4096 75%<=4871 99%<=6889 ]

Terminating due to phase time limit.

2: Running GET(read) on the same bucket with --treescan

# elbencho --hosts xxxxx --numhosts 3 --s3endpoints $ebEndpoints --s3key xxx--s3secret xxxx-r -s 64k -b 64k -t 8 -n 3 -N 99999999999 --nolive --lat --latpercent --latpercent9s 0 --direct --s3nompcheck --s3ignoreerrors --s3objprefix X5D --treescan s3://01test1/ --timelimit 300 --csvfile=get-ignore-scan.csv --resfile=get-ignore-scan.out --port 13001 01test1
NOTE: Bucket scan finished. Objects: 1662879; Elapsed: 28s  <----------
OPERATION   RESULT TYPE         FIRST DONE   LAST DONE
=========== ================    ==========   =========
READ        Elapsed time     :   4m32.451s    5m0.346s
            Objects/s        :        4625        4286
            Objects total    :     1260235     1287520
            Objects latency  : [ min=672us avg=5.48ms max=12.9s ]
            Objects lat % us : [ 1%<=861 50%<=1722 75%<=2435 99%<=110218 ]
            IO latency       : [ min=670us avg=5.48ms max=12.9s ]                                        <<<<<<<<<<<<<<<<See latencies
            IO lat % us      : [ 1%<=861 50%<=1722 75%<=2435 99%<=110218 ]

Terminating due to phase time limit.

TEST SCENARIO 02

  1. Uploading a fixed no of objects without timelimit to make sure all write operations succeeds. (choose the same no of objects as written in first test. )
# elbencho --hosts  xxx--numhosts 3 --s3endpoints xxxxx --s3key xxx--s3secret xxx-w -s 64k -b 64k -t 8 -n 3 -N 23095 --nolive --lat --latpercent --latpercent9s 0 --direct --s3nompcheck --s3objprefix X5D --csvfile=put-3h-objs.csv --resfile=put-3h_objs.out --port 13001 01test3
OPERATION   RESULT TYPE         FIRST DONE   LAST DONE
=========== ================    ==========   =========
WRITE       Elapsed time     :   4m47.980s   6m15.650s                
            Objects/s        :        5501        4426
            Throughput MiB/s :         343         276
            Total MiB        :       99011      103927
            Objects total    :     1584181     1662840
            Objects latency  : [ min=3.30ms avg=4.38ms max=239ms ]
            Objects lat % us : [ 1%<=4096 50%<=4096 75%<=4871 99%<=6889 ]
            IO latency       : [ min=3.30ms avg=4.38ms max=239ms ]
            IO lat % us      : [ 1%<=4096 50%<=4096 75%<=4871 99%<=6889 ]
  1. GET( read) without --treescan
# elbencho --hosts 10.64.8.122,10.64.8.123,10.64.8.124 --numhosts 3 --s3endpoints $ebEndpoints --s3key access --s3secret secret -r -s 64k -b 64k -t 8 -n 3 -N 23095 --nolive --lat --latpercent --latpercent9s 0 --direct --s3nompcheck --s3objprefix X5D --timelimit 300 --csvfile=get.csv --resfile=get.out --port 13001 01test3
OPERATION   RESULT TYPE         FIRST DONE   LAST DONE
=========== ================    ==========   =========
READ        Elapsed time     :   2m15.073s   2m15.634s
            Objects/s        :       12281       12259
            Throughput MiB/s :         767         766
            Total MiB        :      103682      103927
            Objects total    :     1658927     1662840
            Objects latency  : [ min=1.33ms avg=1.95ms max=8.45ms ]
            Objects lat % us : [ 1%<=1722 50%<=2048 75%<=2048 99%<=2435 ]
            IO latency       : [ min=1.33ms avg=1.95ms max=8.45ms ]                                <<<<<<<<<<<<<<<<<<Lower latencies.
            IO lat % us      : [ 1%<=1722 50%<=2048 75%<=2048 99%<=2435 ]

Queries

  • Do we have any mechanism to run a timebound PUT and GET test without worrying about forecasting the object counts because that's really tedious(up to some extent impractical) job to do even before we benchmark a system.
  • If the --treescan is an option then it shouldn't affect the latencies, is there any other possibilities ?

Metadata

Metadata

Assignees

Labels

questionFurther information is requested

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions