Skip to content

FP slow, dominated by MemcpyD2H #2

@clemisch

Description

@clemisch

When profiling FP, runtime seems to be dominated by MemcpyD2H as seen in the Perfetto UI. The actual computations take only very little time in between.

image

I think this could be related to JAX calling each step in jax.lax.scan from host, requiring some synchronization at each iteration. This is discussed here and here.

I don't know how to resolve this in pure JAX. Alternatives to current implementation is vmap'ing instead of scan, or using some unroll in scan. IIRC both led to longer runtime.

Ultimately we should avoid the Memcpy at each projection -- which is what I think is happening. We could switch the levels of scan and vmap, i.e. vmap over angles, but scan over detector rows. I would expect that to be slower, but we could try it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions