Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

max_duration_ms setting does not work in Offline scenario #1397

Open
anhappdev opened this issue Jun 15, 2023 · 8 comments
Open

max_duration_ms setting does not work in Offline scenario #1397

anhappdev opened this issue Jun 15, 2023 · 8 comments

Comments

@anhappdev
Copy link

I wanted to use the TestSettings.max_duration_ms to stop a bechmark run when it takes too long to finish. (see PR mlcommons/mobile_app_open#718)

The setting works as expected in the SingleStream but not in the Offline scenario. In the Offline scenario, the benchmark does not stop and the result is valid.

Is this behavior expected? Or is it a bug?

Log for SingleStream scenario
================================================
MLPerf Results Summary
================================================
SUT name : TFLite
Scenario : SingleStream
Mode     : PerformanceOnly
90th percentile latency (ns) : 16271459
Result is : INVALID
  Min duration satisfied : Yes
  Min queries satisfied : NO
  Early stopping satisfied: NO
Recommendations:
 * The test exited early, before enough queries were issued.
   See the detailed log for why this may have occurred.
Early Stopping Result:
 * Only processed 58 queries.
 * Need to process at least 64 queries for early stopping.

================================================
Additional Stats
================================================
QPS w/ loadgen overhead         : 62.30
QPS w/o loadgen overhead        : 62.36

Min latency (ns)                : 14380417
Max latency (ns)                : 35529416
Mean latency (ns)               : 16035073
50.00 percentile latency (ns)   : 15707375
90.00 percentile latency (ns)   : 16271459
95.00 percentile latency (ns)   : 18694292
97.00 percentile latency (ns)   : 19629375
99.00 percentile latency (ns)   : 35529416
99.90 percentile latency (ns)   : 35529416

================================================
Test Parameters Used
================================================
samples_per_query : 1
target_qps : 1000
target_latency (ns): 0
max_async_queries : 1
min_duration (ms): 100
max_duration (ms): 900
min_query_count : 100
max_query_count : 0
qsl_rng_seed : 10003631887983097364
sample_index_rng_seed : 17183018601990103738
schedule_rng_seed : 12134888396634371638
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
accuracy_log_sampling_target : 0
print_timestamps : 0
performance_issue_unique : 0
performance_issue_same : 0
performance_issue_same_index : 0
performance_sample_count : 830

No warnings encountered during test.

1 ERROR encountered. See detailed log.
2023-06-15 12:05:54.124221: I flutter/cpp/binary/main.cc:407] Accuracy: 65.38%

Process finished with exit code 0
Log for Offline scenario
================================================
MLPerf Results Summary
================================================
SUT name : TFLite
Scenario : Offline
Mode     : PerformanceOnly
Samples per second: 105.135
Result is : VALID
  Min duration satisfied : Yes
  Min queries satisfied : Yes
  Early stopping satisfied: Yes

================================================
Additional Stats
================================================
Min latency (ns)                : 951158708
Max latency (ns)                : 951158708
Mean latency (ns)               : 951158708
50.00 percentile latency (ns)   : 951158708
90.00 percentile latency (ns)   : 951158708
95.00 percentile latency (ns)   : 951158708
97.00 percentile latency (ns)   : 951158708
99.00 percentile latency (ns)   : 951158708
99.90 percentile latency (ns)   : 951158708

================================================
Test Parameters Used
================================================
samples_per_query : 100
target_qps : 1
target_latency (ns): 0
max_async_queries : 1
min_duration (ms): 100
max_duration (ms): 900
min_query_count : 1
max_query_count : 0
qsl_rng_seed : 10003631887983097364
sample_index_rng_seed : 17183018601990103738
schedule_rng_seed : 12134888396634371638
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
accuracy_log_sampling_target : 0
print_timestamps : 0
performance_issue_unique : 0
performance_issue_same : 0
performance_issue_same_index : 0
performance_sample_count : 830

No warnings encountered during test.

No errors encountered during test.

Process finished with exit code 0

The loadgen version used is defined here:
https://github.com/mlcommons/mobile_app_open/blob/4d82d1c186823e38a82867cc4f0116d6bfb1ea8f/WORKSPACE#L118-L129

If you need any other information. Please let me know. Thanks.

@anhappdev
Copy link
Author

Testing with the latest release (v3.0) gave me the same result. max_duration_ms setting does not work in Offline scenario.

@arjunsuresh
Copy link
Contributor

The variable usages are a bit confusing in the loadgen code and so I do not have a full understanding of what's happening. But I believe this is the expected behavior as can be seen here. In the offline scenario, there is only a single query and all samples are part of it (but at some places in the code, samples are considered as queries). Probably this is why max_duration has no effect in the Offline scenario like max_query_count.

But, say you want to enforce max_duration=100000 ms = 100s, a way to do this will be to change the target_qps because in the Offline scenario the number of queries is approximately target_qps * 1.1 * min_duration_in_seconds and all of these always get executed.

I see that, you are using the default (also minimum) target_qps of 1, so the way to enforce the max_duration will be to use min_duration setting.

But if you want to do this transparently for any device without knowing its performance ahead, I don't think it is possible as of now. What we do is run a small test run, get the performance of the system, and then use that value for further runs.

@anhappdev
Copy link
Author

@arjunsuresh Thank you for the information. I will look into the workaround you suggested.
I see there is a AbortTest() in loadgen. Do you think this function will work in an Offline scenario?

@arjunsuresh
Copy link
Contributor

You're welcome. AbortTest() should be working. Just that on a benchmarking system it cannot be used with a timer as then it can interfere with the performance results in Offline/Server scenarios.

@anhappdev
Copy link
Author

Just that on a benchmarking system it cannot be used with a timer as then it can interfere with the performance results in Offline/Server scenarios.

Can you explain this a bit more? Since the max_duration did not work in the Offline scenario, I want to try with AbortTest(), but AbortTest() should not be used in the Offline scenario?

BTW, does it make sense to update/fix loadgen to make the max_duration work in the Offline scenario? IMO, max_duration should work in any scenarios, right?

@arjunsuresh
Copy link
Contributor

You can use AbortTest - but checking the test duration using "time" function call cannot be done during a benchmark run. Probably a timeout option like this can be used.

In singlestream and multistream - a query has 1 or 8 samples. Before issuing the next query loadgen can do any operations as they won't get timed as part of the benchmark run. But in the case of offline scenario, only a single query is there containing all the samples. So, there is no free time for loadgen to call the time function and check the max_duration.

@anhappdev
Copy link
Author

Thank you for the explanation. We will try to implement the AbortTest() function. I will close this issue.

@anhappdev anhappdev closed this as not planned Won't fix, can't repro, duplicate, stale Jul 10, 2023
@anhappdev
Copy link
Author

We did try to implement AbortTest() (tracked in mlcommons/mobile_app_open#749) but it did not work as expected. When calling AbortTest(), the app just hangs forever and StartTest() never finished.

For reference, the implementation can be reviewed here:
mlcommons/mobile_app_open@master...749/abort-running-benchmark

Can someone take a look if AbortTest() ever worked? I can only find a test for StartTest() but not AbortTest().

virtual void RunTest() {
samples_load_count_.resize(total_sample_count_, 0);
samples_issue_count_.resize(total_sample_count_, 0);
samples_between_flushes_.resize(1, 0);
mlperf::StartTest(this, this, test_settings_, log_settings_);
}
virtual void EndTest() {}

@anhappdev anhappdev reopened this Jul 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants