mapreduce allocates a lot on the CPU #211

ViralBShah · 2023-06-22T12:19:05Z

I was expecting fewer allocations, and much faster speed for sum(Vector), but I am not sure what to compare it to.

## GPU

julia> a = Metal.ones(Float32, 10^8);

julia> @time sum(a)
  0.017176 seconds (980 allocations: 23.758 KiB)
1.0f8

## CPU

julia> b = ones(Float32,10^8);

julia> @time sum(b);
  0.014090 seconds (1 allocation: 16 bytes)

The text was updated successfully, but these errors were encountered:

tgymnich · 2023-06-22T12:31:59Z

These might help:
https://developer.apple.com/documentation/metalperformanceshaders/mpsmatrixsum
https://developer.apple.com/documentation/metalperformanceshaders/mpsnnreducerowsum
https://developer.apple.com/documentation/metalperformanceshaders/mpsnnreducecolumnsum

maleadt · 2023-06-22T13:41:57Z

Yeah, mapreduce is known to be slow, #46. We had sped it up at some point, but had to revert (JuliaGPU/GPUArrays.jl#454), and I haven't had the time to revisit.

Adding specializations that use MPS might be a good workaround for the common cases.

maleadt · 2024-03-05T13:55:07Z

Regarding the performance of mapreduce: I think we're fine, see #303 (comment) for a benchmark

Regarding the allocations: I think we can close this in favor of JuliaInterop/ObjectiveC.jl#13

Basically, these aren't caused by the mapreduce implementation, but are a consequence of how the ObjectiveC object wrappers are designed (all objects being abstract types resulting in dynamic dispatch everywhere).
For example, with the simplest kernel possible:

julia> f() = @metal identity(nothing)
f (generic function with 2 methods)

julia> @time f()
  0.000177 seconds (55 allocations: 1.578 KiB)
Metal.HostKernel{typeof(identity), Tuple{Nothing}}(identity, Metal.MTL.MTLComputePipelineStateInstance (object of type AGXG15XFamilyComputePipeline))

Because of these allocations almost all coming from object instances, they are generally small and thus very fast. As such, I don't think this is a performance issue/priority right now.

maleadt added performance Gotta go fast. arrays Things about the array abstraction. labels Feb 28, 2024

maleadt changed the title ~~sum(vector) allocates a lot and feels slow.~~ mapreduce allocates a lot on the CPU Mar 5, 2024

maleadt closed this as completed Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mapreduce allocates a lot on the CPU #211

mapreduce allocates a lot on the CPU #211

ViralBShah commented Jun 22, 2023

tgymnich commented Jun 22, 2023

maleadt commented Jun 22, 2023

maleadt commented Mar 5, 2024

mapreduce allocates a lot on the CPU #211

mapreduce allocates a lot on the CPU #211

Comments

ViralBShah commented Jun 22, 2023

tgymnich commented Jun 22, 2023

maleadt commented Jun 22, 2023

maleadt commented Mar 5, 2024