Binary analysis on CUDA
cuobjdump -ptx <file> | cu++filt
TODO: perform a small binary analysis section on the kernels :D
TODO: cudaGraphDebugDotPrint()
use
__noinline__
to perform binary analisis on__device__
functions
See dump
header for information about compilation
$L__BB3_4:
max.u32 %r19, %r22, %r21;
min.u32 %r22, %r22, %r21;
mul.wide.u32 %rd20, %r19, 4;
add.s64 %rd19, %rd6, %rd20;
//
fence.sc.gpu;
//
//
atom.cas.acquire.gpu.b32 %r21,[%rd19],%r19,%r22;
//
setp.ne.s32 %p4, %r19, %r21;
@%p4 bra $L__BB3_4;
- Fallin, A., Gonzalez, A., Seo, J., & Burtscher, M. (2023, November). A High-Performance MST Implementation for GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1-13).