Add configurable sub-iteration support#114
Add configurable sub-iteration support#114brendandahl wants to merge 7 commits intoGoogleChrome:mainfrom
Conversation
Introduce a `subIterationCount` parameter to control the number of inference steps per benchmark round. Previously, the benchmarks ran a single inference step, only highlighting first run performance. Running multiple inferences helps benchmark peak performance. Since adding more sub-iterations increases the test time, I've also decreased the overall iteration count to keep the full suite time roughly the same (~10min on my M1 pro).
|
CI is sooo slow, these tests run in 2min on my machine. Guess I'll need to up the timeout or cut out even more tests. |
|
Turns out CI was not slow, the test was hanging on a failure. I've fixed that in #115 |
danleh
left a comment
There was a problem hiding this comment.
LGTM with nit. High-level question: Do we expect to choose different sub-iteration-counts per workload in the future? I could imagine choosing, e.g., 100 sub-iterations for WebGPU workloads that process tons of frames, but only 2-3 for heavier tasks, such as processing an audio file. For now, I guess having a single global sub-iteration count is simplest, so fine to keep it as-is, but we might need to change this in the future. (JetStream has different sub-iteration counts, depending on how "heavy" a single iteration is.)
| export function createSubIteratedSuite(benchmark, subIterationCount) { | ||
| const steps = []; | ||
| for (let i = 0; i < subIterationCount; i++) { | ||
| steps.push(new AsyncBenchmarkStep(`run-${i + 1}`, async () => { |
There was a problem hiding this comment.
optional nit: As the step name, how about using subIter-N or sub-iteration-N to stay consistent with the terminology throughout (otherwise "run" is a bit of an overloaded term, whereas "sub iteration" is nicely introduced in this PR and more self-descriptive).
Introduce a
subIterationCountparameter to control the number of inference steps per benchmark round. Previously, the benchmarks ran a single inference step, only highlighting first run performance. Running multiple inferences helps benchmark peak performance.Since adding more sub-iterations increases the test time, I've also decreased the overall iteration count to keep the full suite time roughly the same (~10min on my M1 pro).