Conversation
|
Compute Benchmarks level_zero run (with params: ): |
| "--iterations=1000", | ||
| "--numKernels=100", | ||
| ] | ||
|
No newline at end of file |
|
|
||
| def bin_args(self) -> list[str]: | ||
| return [ | ||
| "--iterations=1000", |
There was a problem hiding this comment.
is that enough iterations for the benchmark to be stable?
| def bin_args(self) -> list[str]: | ||
| return [ | ||
| "--iterations=1000", | ||
| "--numKernels=100", |
There was a problem hiding this comment.
should we add more scenarios with different number of kernels? e.g., with 1 kernel, to see the cost of the whole machinery end-to-end.
|
Compute Benchmarks level_zero run (): SummaryTotal 125 benchmarks in mean. (result is better) Performance change in benchmark groupsRelative perf in group api (9): 99.869%
Relative perf in group memory (4): 100.001%
Relative perf in group miscellaneous (1): 94.154%
Relative perf in group multithread (10): 99.538%
Relative perf in group Velocity-Bench (9): 100.125%
Relative perf in group Runtime (8): 103.487%
Relative perf in group MicroBench (14): 100.154%
Relative perf in group Pattern (10): 100.568%
Relative perf in group ScalarProduct (6): 100.305%
Relative perf in group USM (7): 97.735%
Relative perf in group VectorAddition (3): 101.026%
Relative perf in group Polybench (3): 100.147%
Relative perf in group Kmeans (1): 100.025%
Relative perf in group LinearRegressionCoeff (1): 101.258%
Relative perf in group MolecularDynamics (1): 103.571%
Relative perf in group llama.cpp (6): 100.287%
Relative perf in group alloc/max (20): 102.226%
Relative perf in group multiple (12): 101.311%
DetailsBenchmark details - environment, command, output...api_overhead_benchmark_l0 SubmitKernel out of orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_sycl SubmitKernel out of orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_sycl SubmitKernel in orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros --multiplier=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024 Output:TestCase,Mean,Median,StdDev,Min,Max,Type miscellaneous_benchmark_sycl VectorSumEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=1 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=1 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=1 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=1 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=0 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=0 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=0 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=0 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without eventsEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=1 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without eventsEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=4 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_ur SubmitKernel out of order CPU countEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_ur SubmitKernel out of orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_ur SubmitKernel in order CPU countEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_ur SubmitKernel in orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type Velocity-Bench HashtableEnvironment Variables:Command:/home/pmdk/bench_workdir/hashtable/hashtable_sycl --no-verify Output:hashtable - total time for whole calculation: 0.353551 s Velocity-Bench BitcrackerEnvironment Variables:Command:/home/pmdk/bench_workdir/bitcracker/bitcracker -f /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000 Output:---------> BitCracker: BitLocker password cracking tool <--------- ==================================
|
|
Compute Benchmarks level_zero run (with params: ): |
| MemcpyExecute(self, 10, 16, 1024, 10000, 0, 1, 1), | ||
| MemcpyExecute(self, 4096, 1, 1024, 10, 0, 1, 0), | ||
| MemcpyExecute(self, 4096, 4, 1024, 10, 0, 1, 0), | ||
| GraphApiSinKernelSYCL(self, 0, 1), |
There was a problem hiding this comment.
hm, this might be too much :D
| super().__init__(bench, "graph_api_benchmark_sycl", "SinKernel") | ||
|
|
||
| def name(self): | ||
| return f"graph_api_benchmark_sycl SinKernel" |
There was a problem hiding this comment.
names need to reflect arguments, otherwise the benchmarks won't be unique in output.
This comment was marked as outdated.
This comment was marked as outdated.
pbalcer
left a comment
There was a problem hiding this comment.
The graph benchmarks didn't run because compute-benchmarks is too old. You need to update the commit being used (line 25).
|
Compute Benchmarks level_zero run (with params: ): |
|
Compute Benchmarks level_zero run (): |
|
Compute Benchmarks level_zero run (with params: ): |
This comment was marked as outdated.
This comment was marked as outdated.
|
Compute Benchmarks level_zero run (with params: ): |
This comment was marked as outdated.
This comment was marked as outdated.
|
Compute Benchmarks level_zero run (with params: ): |
This comment was marked as outdated.
This comment was marked as outdated.
|
Compute Benchmarks level_zero run (with params: ): |
| MemcpyExecute(self, 10, 16, 1024, 10000, 0, 1, 1), | ||
| MemcpyExecute(self, 4096, 1, 1024, 10, 0, 1, 0), | ||
| MemcpyExecute(self, 4096, 4, 1024, 10, 0, 1, 0), | ||
| GraphApiSinKernelGraphSYCL(self, 0, 10), |
There was a problem hiding this comment.
Please pick max 2/3 scenarios per benchmark. We need to keep the runtime of the whole job reasonable (I'm aiming for <30 minutes).
|
|
||
| def bin_args(self) -> list[str]: | ||
| return [ | ||
| "--iterations=100", |
There was a problem hiding this comment.
Is this enough iterations for the benchmarks to have reproducible results?
We aim for the stddev between runs (or, rather, the coefficient of variation of all the runs) to be smaller than 2%.
|
This failed with: If this keeps happening after you've reduced the number of scenarios, I suggest we temporarily remove output (just comment it out) from the markdown. I plan on eventually creating an HTML file per PR and then link to it in the markdown, and that will give us the ability to have longer content with all the details. |
|
Compute Benchmarks level_zero run (with params: --filter graph): |
|
Compute Benchmarks level_zero run (--filter graph): SummaryNo diffs to calculate performance change (result is better) Performance change in benchmark groupsRelative perf in group graph (3): cannot calculate
Relative perf in group api (9): cannot calculate
Relative perf in group memory (4): cannot calculate
Relative perf in group miscellaneous (1): cannot calculate
Relative perf in group multithread (10): cannot calculate
Relative perf in group Velocity-Bench (9): cannot calculate
Relative perf in group Runtime (8): cannot calculate
Relative perf in group MicroBench (14): cannot calculate
Relative perf in group Pattern (10): cannot calculate
Relative perf in group ScalarProduct (6): cannot calculate
Relative perf in group USM (7): cannot calculate
Relative perf in group VectorAddition (3): cannot calculate
Relative perf in group Polybench (3): cannot calculate
Relative perf in group Kmeans (1): cannot calculate
Relative perf in group MolecularDynamics (1): cannot calculate
Relative perf in group llama.cpp (6): cannot calculate
Relative perf in group alloc/max (20): cannot calculate
Relative perf in group multiple (12): cannot calculate
DetailsBenchmark details - environment, command, output...graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=10 --withGraphs=0 Output:TestCase,Mean,Median,StdDev,Min,Max,Type graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:50Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=50 --withGraphs=0 Output:TestCase,Mean,Median,StdDev,Min,Max,Type graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=100 --withGraphs=0 Output:TestCase,Mean,Median,StdDev,Min,Max,Type |
|
Compute Benchmarks level_zero run (with params: ): |
|
All graph-related benchmarks failed with: and then I think the gpu crashed: Have you seen that before? |
|
Compute Benchmarks level_zero run (): |
dda6187 to
4e5223a
Compare
|
Compute Benchmarks level_zero run (with params: --filter "graph"): |
|
Compute Benchmarks level_zero run (--filter "graph"): SummaryNo diffs to calculate performance change (result is better) Performance change in benchmark groupsRelative perf in group graph (14): cannot calculate
Relative perf in group api (12): cannot calculate
Relative perf in group memory (4): cannot calculate
Relative perf in group miscellaneous (1): cannot calculate
Relative perf in group multithread (10): cannot calculate
Relative perf in group Velocity-Bench (9): cannot calculate
Relative perf in group Runtime (8): cannot calculate
Relative perf in group MicroBench (14): cannot calculate
Relative perf in group Pattern (10): cannot calculate
Relative perf in group ScalarProduct (6): cannot calculate
Relative perf in group USM (7): cannot calculate
Relative perf in group VectorAddition (3): cannot calculate
Relative perf in group Polybench (3): cannot calculate
Relative perf in group Kmeans (1): cannot calculate
Relative perf in group LinearRegressionCoeff (1): cannot calculate
Relative perf in group MolecularDynamics (1): cannot calculate
Relative perf in group llama.cpp (6): cannot calculate
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (4): cannot calculate
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (4): cannot calculate
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (4): cannot calculate
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (4): cannot calculate
Relative perf in group alloc/min (4): cannot calculate
Relative perf in group multiple (12): cannot calculate
DetailsBenchmark details - environment, command, output...graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=10 --withGraphs=0 Output:TestCase,Mean,Median,StdDev,Min,Max,Type graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=10 --withGraphs=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:50Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=50 --withGraphs=0 Output:TestCase,Mean,Median,StdDev,Min,Max,Type graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:50Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=50 --withGraphs=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:200Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=200 --withGraphs=0 Output:TestCase,Mean,Median,StdDev,Min,Max,Type graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:200Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=200 --withGraphs=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:0, numKernels:20Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=0 --numKernels=20 Output:TestCase,Mean,Median,StdDev,Min,Max,Type graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:1, numKernels:20Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=0 --numKernels=20 Output:TestCase,Mean,Median,StdDev,Min,Max,Type graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:20Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=1 --numKernels=20 Output:TestCase,Mean,Median,StdDev,Min,Max,Type graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:20Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=1 --numKernels=20 Output:TestCase,Mean,Median,StdDev,Min,Max,Type graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:0, numKernels:200Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=0 --numKernels=200 Output:TestCase,Mean,Median,StdDev,Min,Max,Type graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:1, numKernels:200Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=0 --numKernels=200 Output:TestCase,Mean,Median,StdDev,Min,Max,Type graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:200Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=1 --numKernels=200 Output:TestCase,Mean,Median,StdDev,Min,Max,Type graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:200Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=1 --numKernels=200 Output:TestCase,Mean,Median,StdDev,Min,Max,Type |
4e5223a to
3cc0249
Compare
|
Compute Benchmarks level_zero run (with params: ): |
|
Compute Benchmarks level_zero run (): SummaryTotal 128 benchmarks in mean. (result is better) Performance change in benchmark groupsRelative perf in group api (12): 99.833%
Relative perf in group memory (4): 100.317%
Relative perf in group miscellaneous (1): 107.050%
Relative perf in group multithread (10): 100.223%
Relative perf in group graph (10): cannot calculate
Relative perf in group Velocity-Bench (9): 100.043%
Relative perf in group Runtime (8): 98.949%
Relative perf in group MicroBench (14): 101.666%
Relative perf in group Pattern (10): 100.120%
Relative perf in group ScalarProduct (6): 99.984%
Relative perf in group USM (7): 99.828%
Relative perf in group VectorAddition (3): 96.468%
Relative perf in group Polybench (3): 100.469%
Relative perf in group Kmeans (1): 99.863%
Relative perf in group LinearRegressionCoeff (1): 99.105%
Relative perf in group MolecularDynamics (1): 100.000%
Relative perf in group llama.cpp (6): 99.590%
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (4): 96.175%
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (4): 100.875%
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (4): 104.520%
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (4): 98.834%
Relative perf in group alloc/min (4): 99.819%
Relative perf in group multiple (12): 99.977%
DetailsBenchmark details - environment, command, output...api_overhead_benchmark_l0 SubmitKernel out of orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_l0 SubmitKernel in orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_sycl SubmitKernel out of orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_sycl SubmitKernel in orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros --multiplier=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024 Output:TestCase,Mean,Median,StdDev,Min,Max,Type miscellaneous_benchmark_sycl VectorSumEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=1 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=1 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=1 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=1 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=0 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=0 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=0 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=0 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without eventsEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=1 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without eventsEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=4 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=10 --withGraphs=0 Output:TestCase,Mean,Median,StdDev,Min,Max,Type graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=10 --withGraphs=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=100 --withGraphs=0 Output:TestCase,Mean,Median,StdDev,Min,Max,Type graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=100 --withGraphs=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:1, numKernels:10Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=0 --numKernels=10 Output:TestCase,Mean,Median,StdDev,Min,Max,Type graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:10Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=1 --numKernels=10 Output:TestCase,Mean,Median,StdDev,Min,Max,Type graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:100Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=1 --numKernels=100 Output:TestCase,Mean,Median,StdDev,Min,Max,Type graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:0, numKernels:10Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=0 --numKernels=10 Output:TestCase,Mean,Median,StdDev,Min,Max,Type graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:10Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=1 --numKernels=10 Output:TestCase,Mean,Median,StdDev,Min,Max,Type graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:100Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=1 --numKernels=100 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_ur SubmitKernel out of order CPU countEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_ur SubmitKernel out of orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_ur SubmitKernel in order CPU countEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_ur SubmitKernel in orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU countEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_ur SubmitKernel in order with measure completionEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type Velocity-Bench HashtableEnvironment Variables:Command:/home/pmdk/bench_workdir/hashtable/hashtable_sycl --no-verify Output:hashtable - total time for whole calculation: 0.380331 s Velocity-Bench BitcrackerEnvironment Variables:Command:/home/pmdk/bench_workdir/bitcracker/bitcracker -f /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000 Output:---------> BitCracker: BitLocker password cracking tool <--------- ==================================
|
No description provided.