Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt to GPUArrays.jl transition to KernelAbstractions.jl. #461

Merged
merged 1 commit into from
Oct 18, 2024

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Oct 17, 2024

No description provided.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metal Benchmarks

Benchmark suite Current: 1273856 Previous: 100f831 Ratio
private array/construct 26750 ns 26340.333333333332 ns 1.02
private array/broadcast 469291.5 ns 465791 ns 1.01
private array/random/randn/Float32 841854.5 ns 827937.5 ns 1.02
private array/random/randn!/Float32 632500 ns 635250 ns 1.00
private array/random/rand!/Int64 567958 ns 562291.5 ns 1.01
private array/random/rand!/Float32 574083 ns 594500 ns 0.97
private array/random/rand/Int64 788667 ns 801791 ns 0.98
private array/random/rand/Float32 608667 ns 591208 ns 1.03
private array/copyto!/gpu_to_gpu 668750 ns 645500 ns 1.04
private array/copyto!/cpu_to_gpu 619500 ns 618187.5 ns 1.00
private array/copyto!/gpu_to_cpu 812875.5 ns 798500 ns 1.02
private array/accumulate/1d 1396750 ns 1333083 ns 1.05
private array/accumulate/2d 1428000 ns 1424500 ns 1.00
private array/iteration/findall/int 2178958 ns 2100167 ns 1.04
private array/iteration/findall/bool 1914000 ns 1847000 ns 1.04
private array/iteration/findfirst/int 1727167 ns 1696166.5 ns 1.02
private array/iteration/findfirst/bool 1659145.5 ns 1651958.5 ns 1.00
private array/iteration/scalar 3045187 ns 3657771 ns 0.83
private array/iteration/logical 3330666 ns 3264437.5 ns 1.02
private array/iteration/findmin/1d 1579333 ns 1565166 ns 1.01
private array/iteration/findmin/2d 1336979 ns 1351333.5 ns 0.99
private array/reductions/reduce/1d 1063875 ns 1063291 ns 1.00
private array/reductions/reduce/2d 672791 ns 695645.5 ns 0.97
private array/reductions/mapreduce/1d 1085166.5 ns 1078084 ns 1.01
private array/reductions/mapreduce/2d 677500 ns 705166 ns 0.96
private array/permutedims/4d 2911500 ns 860084 ns 3.39
private array/permutedims/2d 1065021 ns 862229.5 ns 1.24
private array/permutedims/3d 1629229 ns 919520.5 ns 1.77
private array/copy 577083 ns 574854 ns 1.00
latency/precompile 4529354083 ns 4396587542 ns 1.03
latency/ttfp 6845760958.5 ns 6698494124.5 ns 1.02
latency/import 882184000 ns 722852834 ns 1.22
integration/metaldevrt 735937.5 ns 719875 ns 1.02
integration/byval/slices=1 1527250 ns 1530167 ns 1.00
integration/byval/slices=3 9243250.5 ns 9115541.5 ns 1.01
integration/byval/reference 1541021 ns 1520271 ns 1.01
integration/byval/slices=2 2508667 ns 2666416 ns 0.94
kernel/indexing 481583 ns 468541 ns 1.03
kernel/indexing_checked 464583 ns 461292 ns 1.01
kernel/launch 8875 ns 8834 ns 1.00
metal/synchronization/stream 14958 ns 14583 ns 1.03
metal/synchronization/context 15042 ns 15250 ns 0.99
shared array/construct 25541.666666666668 ns 26069.5 ns 0.98
shared array/broadcast 472500 ns 468333 ns 1.01
shared array/random/randn/Float32 772791 ns 785583 ns 0.98
shared array/random/randn!/Float32 660500 ns 626541.5 ns 1.05
shared array/random/rand!/Int64 566750 ns 564084 ns 1.00
shared array/random/rand!/Float32 590292 ns 598792 ns 0.99
shared array/random/rand/Int64 769583 ns 788666 ns 0.98
shared array/random/rand/Float32 628146 ns 629791 ns 1.00
shared array/copyto!/gpu_to_gpu 85708 ns 96916 ns 0.88
shared array/copyto!/cpu_to_gpu 89792 ns 88583 ns 1.01
shared array/copyto!/gpu_to_cpu 83667 ns 83458 ns 1.00
shared array/accumulate/1d 1417625.5 ns 1356667 ns 1.04
shared array/accumulate/2d 1432791.5 ns 1421333 ns 1.01
shared array/iteration/findall/int 1904333.5 ns 1792833 ns 1.06
shared array/iteration/findall/bool 1677792 ns 1620166.5 ns 1.04
shared array/iteration/findfirst/int 1394021 ns 1385791 ns 1.01
shared array/iteration/findfirst/bool 1374083.5 ns 1376291 ns 1.00
shared array/iteration/scalar 157917 ns 151458 ns 1.04
shared array/iteration/logical 3093958 ns 3042333 ns 1.02
shared array/iteration/findmin/1d 1288583 ns 1274875 ns 1.01
shared array/iteration/findmin/2d 1361333 ns 1346333 ns 1.01
shared array/reductions/reduce/1d 679937 ns 694458 ns 0.98
shared array/reductions/reduce/2d 690875 ns 702292 ns 0.98
shared array/reductions/mapreduce/1d 741979 ns 754229 ns 0.98
shared array/reductions/mapreduce/2d 700041 ns 705395.5 ns 0.99
shared array/permutedims/4d 2933000 ns 858875 ns 3.41
shared array/permutedims/2d 1054250 ns 862292 ns 1.22
shared array/permutedims/3d 1625958 ns 923916.5 ns 1.76
shared array/copy 250166.5 ns 246583 ns 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@maleadt
Copy link
Member Author

maleadt commented Oct 17, 2024

private array/permutedims/4d 	2929083 ns 	860084 ns 	3.41
shared array/permutedims/4d 	2931166.5 ns 	858875 ns 	3.41

That looks like a big regression. Let's re-start the run to be sure.

@maleadt maleadt force-pushed the tb/gpuarrays_kernelabstractions branch from 840dac4 to 1273856 Compare October 17, 2024 18:39
@maleadt
Copy link
Member Author

maleadt commented Oct 18, 2024

OK yeah permutedims remains slow, will have a closer look.

@maleadt
Copy link
Member Author

maleadt commented Oct 18, 2024

Let's track the slowdown here: JuliaGPU/GPUArrays.jl#565

@maleadt maleadt merged commit 711758d into main Oct 18, 2024
2 checks passed
@maleadt maleadt deleted the tb/gpuarrays_kernelabstractions branch October 18, 2024 06:08
christiangnrd added a commit to christiangnrd/Metal.jl that referenced this pull request Oct 18, 2024
christiangnrd added a commit to christiangnrd/Metal.jl that referenced this pull request Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants