Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mapreduce allocates a lot on the CPU #211

Closed
ViralBShah opened this issue Jun 22, 2023 · 3 comments
Closed

mapreduce allocates a lot on the CPU #211

ViralBShah opened this issue Jun 22, 2023 · 3 comments
Labels
arrays Things about the array abstraction. performance Gotta go fast.

Comments

@ViralBShah
Copy link
Contributor

I was expecting fewer allocations, and much faster speed for sum(Vector), but I am not sure what to compare it to.

## GPU

julia> a = Metal.ones(Float32, 10^8);

julia> @time sum(a)
  0.017176 seconds (980 allocations: 23.758 KiB)
1.0f8

## CPU

julia> b = ones(Float32,10^8);

julia> @time sum(b);
  0.014090 seconds (1 allocation: 16 bytes)
@maleadt
Copy link
Member

maleadt commented Jun 22, 2023

Yeah, mapreduce is known to be slow, #46. We had sped it up at some point, but had to revert (JuliaGPU/GPUArrays.jl#454), and I haven't had the time to revisit.

Adding specializations that use MPS might be a good workaround for the common cases.

@maleadt maleadt added performance Gotta go fast. arrays Things about the array abstraction. labels Feb 28, 2024
@maleadt maleadt changed the title sum(vector) allocates a lot and feels slow. mapreduce allocates a lot on the CPU Mar 5, 2024
@maleadt
Copy link
Member

maleadt commented Mar 5, 2024

Regarding the performance of mapreduce: I think we're fine, see #303 (comment) for a benchmark


Regarding the allocations: I think we can close this in favor of JuliaInterop/ObjectiveC.jl#13

Basically, these aren't caused by the mapreduce implementation, but are a consequence of how the ObjectiveC object wrappers are designed (all objects being abstract types resulting in dynamic dispatch everywhere).
For example, with the simplest kernel possible:

julia> f() = @metal identity(nothing)
f (generic function with 2 methods)

julia> @time f()
  0.000177 seconds (55 allocations: 1.578 KiB)
Metal.HostKernel{typeof(identity), Tuple{Nothing}}(identity, Metal.MTL.MTLComputePipelineStateInstance (object of type AGXG15XFamilyComputePipeline))

Because of these allocations almost all coming from object instances, they are generally small and thus very fast. As such, I don't think this is a performance issue/priority right now.

@maleadt maleadt closed this as completed Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays Things about the array abstraction. performance Gotta go fast.
Projects
None yet
Development

No branches or pull requests

3 participants